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Abstract 

In eukaryotic genomes, nucleosomes function to compact DNA and to regulate 
access to it both by simple physical occlusion and by providing the substrate for 
numerous covalent epigenetic tags. While nucleosome positions in vitro are deter- 
mined by sequence alone, in vivo competition with other DNA-binding factors and 
action of chromatin remodeling enzymes play a role that needs to be quantified. 
We developed a biophysical model for the sequence dependence of DNA bending 
energies, and validated it against a collection of in vitro free energies of nucleo- 
some formation and a nucleosome crystal structure; we also successfully designed 
both strong and poor histone binding sequences ab initio. For in vivo data from 
S . cerevisiae, the strongest positioning signal came from the competition with other 
factors. Based on sequence alone, our model predicts that functional transcription 
factor binding sites have a tendency to be covered by nucleosomes, but are uncov- 
ered in vivo because functional sites cluster within a single nucleosome footprint, 
making transcription factors bind cooperatively. Similarly a weak enhancement of 
nucleosome binding in the TATA region for naked DNA becomes a strong deple- 
tion when the TATA-binding protein is included, in quantitative agreement with 
experiment. Predictions at specific loci were also greatly enhanced by including 
competing factors. Our physically grounded model distinguishes multiple ways in 
which genomic sequence can influence nucleosome positions and thus provides an 
alternative explanation for several important experimental findings. 
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1 Introduction 



Genomic DNA is packaged into chromatin in eukaryotic cells. The building block of 
chromatin is the nucleosome, 1 a 147 base pair (bp) DNA segment wrapped in ~ 1.8 
superhelical coils on the surface of a histone octamer. 2 The unstructured tails of the 
histones are the targets of numerous covalent modifications 1 and may influence how the 
ordered nucleosome array folds into higher order chromatin structures. Chromatin can 
both block access to DNA 3 and juxtapose sites far apart on the linear sequence. 4 

The regulation of gene expression by DNA-bound factors can be partially understood 
from their binding energies and thermodynamics. Nucleosomes formation energies vary 
by about 5 kcal/mol in vitro, 5 comparable to the variation of DNA-binding energies in 
other factors. It is then natural to ask whether the locations of nucleosomes in vivo can 
be predicted thermodynamically, either in isolation or in competition with other DNA- 
binding factors. Under this scenario, the role of chromatin and its remodeling enzymes 
would be catalytic, modifying the rate of assembly but not the final disposition of factors 
on DNA. The alternative is that chromatin remodeling within a regulatory unit actively 
positions nucleosomes to control access to the DNA, in analogy with motor proteins. 

It has not been possible to quantify by genetics where living cells fall between these 
extremes. Recent computational approaches 6-8 have used collections of DNA positioning 
sequences isolated in vivo to train pattern matching tools that were then applied genome- 
wide. However, the training data may not be representative of direct histone-DNA binding 
if other factors reposition nucleosomes in vivo by exclusion. Furthermore, models based 
on alignments of nucleosome positioning sequences 6 ' 7 require a choice of background or 
reference sequence and it is known that nucleotide composition varies among functional 
categories of DNA. 

2 Results and Discussion 

Biophysical model of a nucleosome core particle. For these reasons we developed 
a biophysical model for nucleosome formation that resolves the energy into the sum of 
two potentials modeling the core histone-DNA attraction and the intra-DNA bending 
energy. The first potential is assumed to be independent of the DNA sequence since there 
are few direct contacts between the histone side chains and DNA bases. 9 For the DNA 
bending, we constructed an empirical quadratic potential 10 ' 11 using a database of 101 
non-homologous, non-histone protein-DNA crystal structures. 

More specifically, we model DNA base stacking energies by defining three displace- 
ments (rise, shift, and slide) and three rotation angles (twist, roll and tilt) for each din- 
ucleotide (Fig. la). 10 Together the six degrees of freedom completely specify the spatial 
position of bp % + 1 in the coordinate frame of bp i (Supporting Information (SI) Fig. 7). 
Geometric transformations based on these degrees of freedom can be applied recursively 
to reconstruct an arbitrary DNA conformation in Cartesian coordinates. Conversely, the 
atomic coordinates in a crystal structure allow the inference of the displacements and 
rotations for each dinucleotide (SI Methods). The elastic force constants in the quadratic 
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DNA bending energy are inferred from the protein-DNA structural data by inverting a 
covariance matrix of deviations of dinucleotide geometric parameters from their average 
values. 10 Strongly bent dinucleotides (with one or more geometric parameters further 
than 3 standard deviations from the mean) are iteratively excluded from the data set (SI 
Methods). Our model does not use any higher order moments of empirical geometry dis- 
tributions, which would lead to a non-quadratic elastic potential; nor are there sufficient 
data to model more than successive base pairs. 

We assume that the histone-DNA potential is at minimum along an ideal superhe- 
lix whose pitch and radius are taken from the nucleosome crystal structure, 2 and varies 
quadratically when the DNA deviates from the superhelix. This sequence-independent 
term represents the balance between the average attractive electrostatic (and other) in- 
teractions between the histones and the DNA phosphate backbone, 12 and steric exclusion 
between the proteins and DNA. 

The sum of the bending and histone-DNA potentials is minimized to yield the elastic 
energy and the dinucleotide geometries for each nucleosomal sequence (see Methods). 
Because the total energy is quadratic, energy minimization is equivalent to solving a 
system of linear equations and thus the algorithm can be applied on a genome-wide scale. 
There is no background in this model, but the results may depend on the choice of the 
protein-DNA structural data set. Since our bending energy is empirical and inferred 
from co-crystal structures, it lacks a physical energy scale. By comparison with the 
worm-like chain model our units can be converted to kcal/mol through multiplication by 
0.26 (SI Methods). We position multiple nucleosomes and proteins bound to DNA using 
standard thermodynamics and enforce steric exclusion between bound entities in any given 
configuration (SI Methods). Our program DNABEND can compute the sequence-specific 
energy of DNA restrained to follow an arbitrary spatial curve. 

Analysis of DNA geometries from the nucleosome crystal structure. One 
way to validate DNABEND is to predict the DNA conformation in the high-resolution (1.9 
A) nucleosome crystal structure 2 (PDB code lkx5) using only the DNA sequence. The 
DNABEND predictions are significantly correlated with the experimental geometries for 
twist, roll, tilt and slide (r > 0.49), but are less successful for shift and rise (SI Fig. 10), 
although the peak positions are generally correct. DNABEND does not reproduce rapid 
shift oscillations in the region between bps 35 and 105, and underestimates the magnitude 
of observed deviations in rise and slide. We under-predict slide since for certain key base 
pairs our training structures imply a much smaller mean value of slide (e.g. 0.18 A for CA 
steps) than that observed in the nucleosome structures (0.91 A for CA steps). Changing 
just this one mean value gives more reasonable magnitudes of slide (SI Fig. 10, black 
curve). Tolstorukov et al. also observed that slide is large and positive in the nucleosome 
crystal structure, and makes a significant contribution to the superhelical trajectory. 13 
Because nucleosomal DNA is highly bent, different degrees of freedom are strongly coupled 
(SI Fig. 11a): for example, base pairs tend to tilt and shift simultaneously to avoid a 
steric clash. These couplings are much less pronounced in the non-histone protein-DNA 
complexes used to derive the elastic energy model (SI Fig. lib), but nonetheless appear 
prominently when the lkx5 DNA is positioned by DNABEND (SI Fig. 11c). 
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Analysis of available nucleosome crystal structures shows that the tight DNA wrap- 
ping is facilitated by sharp DNA kinks if flexible dinucleotides (e.g. 5'-CA/TG-3' or 
5'-TA-3') are introduced into the region where the minor groove faces the histone surface. 
For other sequences and other structural regions the bending is distributed over several 
dinucleotides. 2 We substituted all possible dinucleotides into the lkx5 atomic structure 
(keeping DNA conformation fixed), and computed the elastic energy for each sequence 
variant. The most sequence specific regions are those where the minor groove faces the 
histone octamer (SI Fig. 8a). The specificity is especially dramatic if DNA is kinked 
(e.g. at positions 109, 121 and 131; SI Fig. 10). 2 Although these positions are occupied 
by CA/TG dinucleotides in the crystal structure, the model assigns the lowest energy 
to TA dinucleotides, consistent with the periodic TA signal previously observed in good 
nucleosome positioning sequences 14 (SI Fig. 8b). 

Comparison with in vitro measurements. DNABEND accurately predicts exper- 
imental free energies of nucleosome formation 5,15 ' 16 (Fig. 2a, SI Fig. 12a). In contrast, 
the alignment model of Segal et al. 6 trained on yeast nucleosomal sequences has little 
predictive power for the sequences in this set, most of which were chemically synthesized. 
DNABEND also correctly ranks sequences selected in vitro for their ability to form stable 
nucleosomes, or to be occupied by nucleosomes in vivo (Fig. 2b). 6,17 Because the align- 
ment model is constructed using the latter set it assigns anomalously low (more favorable) 
energies to some of the sequences from it, and higher (less favorable) energies to the ar- 
tificial sequences known from experiment to have excellent binding affinities. 5 ' 17 Finally, 
DNABEND correctly ranks mouse genome sequences selected in vitro on the basis of their 
high or low binding affinity 18 ' 19 (P = 1.42 x 10~ 9 ), whereas the alignment model has less 
resolution (P = 1.73 x 10~ 2 ), though on average it does assign better scores to the high 
affinity set (SI Fig. 12b). 

A further test of how DNABEND actually positions nucleosomes on DNA can be 
provided by a collection of sequences where in vitro positions are known with 1-2 bp 
accuracy. We have determined nucleosome positions on synthetic high-affinity sequences 
601, 603, and 605 17 using hydroxyl radical footprinting (SI Figs. 13 and 14), and obtained 
3 more sequences from the literature. 13 The measured position is always within 1-2 bp of 
a local minimum in our energy, and that energy minimum in 5 out of 6 cases is within 
0.5-1.0 kcal/mol of the global energy minimum (SI Fig. 15; note that the total range of 
sequence- dependent binding energies is ~ 5 kcal/mol). 

We also asked if DNABEND could be used to design DNA sequences with high and 
low binding affinities for the histone octamer. Free energies of computationally designed 
sequences were measured in vitro using salt gradient dialysis 5 ' 20 (SI Table 1, SI Meth- 
ods). The free energy of the designed best sequence was lower than the free energy of the 
designed worst sequence, although only by 1.6 kcal/mol, which is less than the experi- 
mentally known range of free energies (SI Table 1, SI Fig. 12a). These results underscore 
both the ranking power and the limitations of our current DNA mechanics model. 

Periodic dinucleotide distributions in high and low energy sequences. DNABEND- 
selected nucleosome sequences exhibit periodic dinucleotide patterns that are consistent 
with those determined experimentally: 17 for example, with lowest energy sequences, 5'- 
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AA/TT-3' and 5'-TA-3' dinucleotide frequencies are highest in the negative roll regions 
(where the minor groove faces inward), while 5'-GC-3' frequencies are shifted by ~ 5 bp 
(SI Fig. 16). Surprisingly, the distributions of AT and AA/TT, TA dinucleotides are in 
phase, despite a very low flexibility of the former (SI Fig. 8b). It is possible that AT steps 
are used to flank a more flexible kinked dinucleotide. We estimate the energy difference 
between the best and the worst 147 bp nucleosome forming sequences to be 15.2 kcal/mol, 
with the energies of 95% of genomic sequences separated by less than 6.4 kcal/mol (SI 
Methods). This is larger than the experimental range (SI Fig. 12a) because nucleosomes 
cannot be forced in experiments to occupy the worst-possible location on a DNA, but 
instead tend to find the most favorable locations with respect to the 10 bp helical twist. 

Prediction of nucleosome positions in the yeast genome. We use nucleosome 
energies (computed using the 147 bp superhelix) and the binding energies of other reg- 
ulatory factors to construct a thermodynamic model in which nucleosomes form while 
competing with other proteins for access to genomic sequence. A typical configuration 
thus contains multiple DNA-bound molecules of several types and explicitly respects steric 
exclusion. We take all such configurations into account using dynamic programming meth- 
ods 21 that enable us to compute a Boltzmann-weighted statistical sum and thus the prob- 
ability P for each factor to bind DNA starting at every possible position along the genome 
(Fig. lb, SI Methods). 6 We also compute the occupancy of each genomic bp, defined as 
its probability to be covered by any protein of a given type (Fig. lb). There are two 
adjustable parameters for each DNA-binding factor: the temperature, which determines 
how many factors are bound with high probability, and the free protein concentration, 
which determines the average occupancy. With our choice of parameters (which have 
not been optimized, rather, were fixed to allow for comparison with previous studies - 
SI Methods), the average nucleosome occupancy is 0.797, and stable, non-overlapping 
nucleosomes (with P > 0.5) cover 16.3% of the yeast genome. 

Bioinformatic models based on alignments of nucleosome positioning se- 
quences. Several existing analyses of nucleosome positions are based on alignment mod- 
els, and these in turn explicitly 6 or implicitly 7 include a background model. We examined 
the sensitivity of one alignment-based approach to the choice of background by imple- 
menting it in two different ways: alignment model I is identical to Segal et al. 6 and uses 
genomic background for one strand and uniform background for the other, while align- 
ment model II employs genomic background for both strands (SI Methods). Comparing 
two models allows us to separate more robust predictions from those that depend strongly 
on the implementation details. Although the alignment models correlate well with each 
other (r = 0.66), we find a small negative correlation between DNABEND-predicted oc- 
cupancies and energies and those from the alignment model I (SI Fig. 18). DNABEND 
energies are strongly correlated if two nucleosomes are separated by a multiple of 10 bp, 
and anti-correlated if the nucleosomes are separated by a multiple of 5 bp, which puts 
the helical twist out of phase (SI Fig. 17c). Log scores predicted by the alignment model 
exhibit weaker oscillations (SI Fig. 17d), probably due to ambiguities in aligning the 
training sequences. 

Because all three models make somewhat different predictions of average nucleosome 
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occupancies in broad genomic regions (SI Fig. 19), additional experimental data are 
required to establish which model is the most accurate. In contrast, all models predict that 
the distribution of center-to-center distances between proximal stable nucleosomes (with 
P > 0.5) exhibits strong periodic oscillations (SI Fig. 20). However, similar oscillations 
occur in a non-specific model where we first create a regular nucleosomal array and then 
randomly label a given fraction of nucleosomes as stable (data not shown). Because steric 
exclusion is respected by every model, nucleosomes form regular arrays (SI Figs. 17a,b) 
which help induce the observed periodicity in the positions of stable nucleosomes. 

Nucleosome depletion upstream of the ORFs. Microarray-based maps of in vivo 
nucleosome positions show striking depletion of nucleosomes from the promoter regions 
(Fig. 3a). 22 ' 23 We find that this depletion is difficult to explain using nucleosome models 
alone. However, the nucleosome-free region upstream of the open reading frames (ORFs) 
becomes much more pronounced when TATA box-binding proteins (TBPs) and other fac- 
tors bind their cognate sites (Fig. 3a, SI Fig. 21). Displaced nucleosomes are re-arranged 
in regular arrays on both sides of factor-occluded sites, 24 ' 25 creating the characteristic os- 
cillatory structure around the nucleosome-free region which includes the occupancy peak 
over the translation start observed in both regular 22 and H2A.Z 26 nucleosomes (Fig. 3a 
and SI Fig. 22). Similar oscillations occur when non-specific nucleosomes compete with 
TBP (SI Fig. 21), making intrinsic nucleosome sequence preferences hard to disentangle 
from the larger phasing effects in this data set. 

Nucleosome occupancy of TATA boxes. Nonetheless, nucleosome stabilities can 
play a crucial role in gene activation: for example, DNABEND predicts that both TATA 
boxes in the promoter of the yeast MEL1 (a-galactosidase) gene are occupied by a stable 
nucleosome, in agreement with the extremely low level of background gene expression ob- 
served in MEL1 promoter-based reporter plasmids 27 ' 28 (Fig. lb). In contrast, the TATA 
elements of the CYC1 promoter were shown to be intrinsically accessible in vivo, 29 ' 30 
resulting in high background expression levels. Consistent with these findings, we pre- 
dict that one of the CYC1 TATA boxes has intrinsically low nucleosome occupancy, and 
moreover that the nucleosome is easily displaced in competition with TBP (Fig. lb). 

It has been suggested that TATA box- containing genes may be repressed through 
steric exclusion of TBP by a nucleosome placed over the TATA box. 31 In agreement 
with this hypothesis, in the absence of other factors DNABEND predicts slightly higher 
nucleosome occupancy over TATA boxes (Fig. 3b). Stress-induced genes are TATA-rich 
and may be nucleosome-repressed under non-stress conditions. 31 Thus, this prediction 
of DNABEND can be tested experimentally by measuring nucleosome occupancy over 
TATA boxes of stress-induced genes in non-inducing conditions, genome-wide (to ensure 
adequate statistics which were not provided by the data sets available to us, 22 although 
it was found in a very recent genome-wide study of nucleosome occupancy in yeast that 
promoters of stress-induced genes tend to be covered by nucleosomes when cells are grown 
in YPD media, whereas the transcription start site for the typical gene was nucleosome- 
free 32 ). 

Nucleosome-induced TF cooperativity and nucleosome depletion over TF 
binding sites. It has been known from chromatin immunoprecipitation experiments 
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that some near-consensus DNA sequences are not occupied by their cognate transcription 
factors (TFs), while poorer sites may be occupied. 33 Thus it is natural to ask whether 
histone binding preferences can help distinguish functional and non-functional TF binding 
sites. Segal et al. found that nucleosome occupancy was intrinsically smaller at functional 
sites 6 (SI Fig. 24a), while DNABEND predicts the opposite (Fig. 4a, upper panel) (but 
see contrary low resolution data from Liu et al. 35 ). However, if TFs are allowed to compete 
with nucleosomes at all sites, the nucleosomes become preferentially depleted (and the TF 
occupancy becomes higher) at the functional sites (Fig. 4a, lower panel). This depletion 
is not due to the slightly more favorable binding energies of the functional sites (SI Fig. 
25a), because it is reversed when their positions are randomized within intergenic regions 
(SI Fig. 24c). Furthermore, DNABEND-predicted nucleosomes are not depleted over 
functional sites simply because they are less stable - Fig. 4a showed the opposite. In 
fact, functional sites tend to occur just upstream of the ORFs (SI Fig. 25b), where 
DNABEND-predicted nucleosomes exhibit enhanced stability (SI Fig. 23). The only 
remaining possibility is that the depletion of DNABEND-predicted nucleosomes is caused 
by the spatial clustering of functional binding sites (Fig. 4b, SI Fig. 25c). In this scenario 
several DNA-binding proteins cooperate to evict the nucleosome and thus enhance their 
own binding, in a phenomenon known as the nucleosome-induced cooperativity (Fig. 
4c). 34,36 The cooperativity does not depend on direct interactions between TFs and 
requires only that two or more TF sites occur within a 147 bp nucleosomal footprint (Fig. 
4d). A model in which TFs compete with nucleosomes for their cognate sites provides 
the best explanation for the strong nucleosome depletion over functional sites observed 
in microarray experiments 22 (Fig. 4e). Although it is not yet conventional to do so, 33 it 
would be trivial to reward clustering within 147 bp in codes that predict regulatory sites. 

Predictions of experimentally mapped nucleosome positions. We used a set 
of 110 nucleosome positions reported in the literature for 13 genomic loci to see how 
well DNABEND predictions match in vivo chromatin structure (Fig. 5, SI Figs. 26a- 
n). The predictive power of nucleosome models improves considerably when sequence- 
specific factors are included (Fig. 5), though not in genome-wide sets (SI Fig. 27); this 
result may be due to the lack of accurate genome-wide knowledge of TF binding sites and 
energies. For example, at HMLa, HMRsl, and recombination enhancer loci, nucleosomes 
are positioned in regular arrays whose boundaries are determined by the origin recognition 
complex, Abfl, Rapl, and Mcml/MATo;2 bound to their cognate sites (SI Figs. 26f-h). 

Both intrinsic sequence preferences and boundaries created by other factors contribute 
to positioning nucleosomes: on one hand, at the GAL1-10 locus a nucleosome covering 
a cluster of GAL4 sites is evicted, making ~ 200 bp of promoter sequence accessible to 
TBP and other factors (SI Fig. 26a), as observed in vivo. 37 On the other hand, GCN4 
and ABF1 sites at the HIS3-PET56-DED1 locus are intrinsically nucleosome-depleted, 
because histones have lower affinity for HIS3-PET56 and DED1 promoters (Fig. 6, SI 
Fig. 26k). 38 DNABEND correctly predicts chromatin re-organization caused by sequence 
deletions in the HIS3-PET56 promoter region: 38 sequence-dependent nucleosome posi- 
tions are refined through the action of GCN4 and TBP to improve the agreement with 
experiment (SI Figs. 261-n). 
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3 Conclusion 



We have developed a DNA mechanics model capable of accurately predicting in vitro 
free energies of nucleosome formation, optimal base pairs in the minor and major grooves 
of nucleosomal DNA, and DNA geometries specific to each base step. We only get an 
agreement with the available genome scale data sets 22 ' 23 for nucleosome depletion from the 
TATA box and functional TF binding sites when we include competition with the relevant 
factors. In the absence of DNA-binding factors DNABEND predicts a weak enhancement 
of nucleosome occupancy over their sites and thus agrees with the conjecture, not yet 
demonstrated on a genome-wide scale, that nucleosomes provide a default repression. 31 
The two alignment models we examined do not support this conjecture; it is conceivable 
that they have implicitly captured a hybrid signal from both nucleosomes and DNA-bound 
factors. 

The highest quality data on generic nucleosome occupancy come from the specific loci 
summarized in Fig. 5. There is a relatively weak signal from all models in the absence 
of other factors, originating from a few correct predictions at short distances. Because 
DNABEND predicts correct binding energies, the weak signal suggests that nucleosomes 
positioned by thermodynamics on bare DNA have relatively weak correlation with in vivo 
positions, while inclusion of other factors substantially improves the picture. DNABEND 
presents a useful biophysical framework for analysis of in vivo nucleosome locations and 
TF-nucleosome competition. It will be interesting to examine its predictions for metazoan 
genomes, and to modulate gene expression levels in model systems through computational 
design of nucleosome occupancy profiles. 



For full details, see SI Methods. DNABEND software and additional supporting data 
(including a S.cerevisiae nucleosome viewer) are available on the Nucleosome Explorer 
website: http://nucleosome.rockefeller.edu. 

The total energy of a nucleosomal DNA is given by a weighted sum of two quadratic 
potentials: 



where E e i is the sequence-specific DNA elastic energy 10 ' 11 and E s h is the histone-DNA 
interaction energy (see SI Methods). The potentials are quadratic with respect to de- 

— * — * 

viations of global displacements 5d s and local angles 5Q S from their ideal superhelical 
values. The weight w is fit to maximize the average correlation coefficient between the 
distributions of geometric parameters observed in lkx5 2 and the DNABEND predictions 
(SI Fig. 10). The conformation adopted by the DNA molecule is the one that minimizes 
its total energy E, yielding a system of linear equations: 



4 Methods 



E = E e i + wE sh , 




dE/d5al = s — 1...N, i = 1 . . . 6, 



(2) 
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— * — * 

where Sa s = (SQ S , 5d s ) and N is the number of dinucleotides. The nucleosome energies 
(assumed to be given by E el ) and the energies of other DNA-binding factors at each 
genomic position are then used as input to a dynamic programming algorithm 21 which 
outputs binding probabilities and bp occupancies for each DNA element. 
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Figure 1: DNA mechanics model of histone-DNA interactions is used to predict in vitro 
and in vivo nucleosome positions, a) Conformation of a single DNA base step (defined as two 
consecutive DNA base pairs in the 5' — > 3' direction) is described by six geometric degrees of 
freedom: rise, shift, slide, twist, roll, and tilt. 10 DNA base pairs are shown as rectangular blocks. 
The nucleosome energy is a weighted sum of the elastic energy E e i and the restraint energy E s h 
which penalizes deviations of the DNA path from the ideal superhelix (see Methods), b) Nucleosome 
positions explain background gene expression levels observed in reporter plasmids. Panel I (from top). 
Blue: nucleosome energies (in arbitrary units) in the CYC1 promoter region from the lacl::lacZ 
reporter plasmid, 29 with the 10-11 bp periodicity due to DNA helical twist. Vertical green lines: 
energies of TBPs bound to two experimentally mapped TATA boxes 29 (other binding positions are 
not allowed). Panel II. Probability of a nucleosome to start at each base pair, in the absence (blue) 
and presence (maroon) of TBP. Some of the latter nucleosomes are also shown as orange ovals (note 
that in general nucleosome positions may overlap). Green: probability of a TBP to bind a TATA 
box. Panel III. Nucleosome occupancy in the absence (blue) and presence (maroon) of TBP. Green: 
TBP occupancy, red vertical lines: TATA box positions. Arrows on the right correspond to the order 
of calculations. Panel IV. Nucleosome occupancy of the MEL1 promoter region from the DIT1::GFP 
reporter vector 27 (see CYC1 legend for the color scheme). Note that blue and maroon occupancy 
profiles completely overlap. 
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Figure 2: DNABEND accurately ranks free energies of nucleosome formation and sets of 
nucleosome sequences, a) Nucleosome free energies of natural and synthetic sequences ranked by 
DNABEND (upper panel) and by a probabilistic model based on the alignment of sequences extracted 
from yeast mononucleosomes (lower panel). 6 Blue triangles: Thastrom et al. nucleosome dialysis 
assay, 5 black triangles: Shradereta/. nucleosome exchange assay. 15, 16 Free energies were computed 
using only the central 71 bp of the 147 bp nucleosomal site (cf. SI Fig. 12), because competitive 
nucleosome reconstitution on DNAs with any lengths between 71 and 147 bp gives identical apparent 
free energies, and quantitatively equivalent free energies are obtained using either the full histone 
octamer or just the core histone tetramer. 6,20 b) Histograms of DNA elastic energies (in arbitrary 
units) and alignment model log scores computed using the 147 bp nucleosomal site, consistent with 
sequence lengths found in the in vitro selection on the yeast genome. 6 Yeast genomic sequences are 
compared to three sets of sequences selected for their nucleosome positioning ability. Blue: energies 
of all 147 bp long sequences from S.cerevisiae chromosome III, green: energies of sequences from a 
genome-wide in vivo mononucleosome extraction assay, 6 red: energies of sequences from an in vitro 
selection assay on yeast genomic DNA, 6 black: energies of sequences from a SELEX experiment on a 
large pool of chemically synthesized random DNA molecules. 17 Sequences shorter than 147 bp were 
omitted from all selected sequence sets; in sequences longer than 147 bp the most favorable energy 
was reported, taking both forward and reverse strands into account. 
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a 




Figure 3: Nucleosome-free regions are explained by bound TBPs. a) Average nucleosome 
occupancy plotted with respect to the translation start sites (black arrow) (a), and TATA boxes 39 
(blue rectangle) (b). Green: nucleosomes only (DNABEND), dark blue: nucleosomes only (alignment 
model I), 5 blue: nucleosomes only (alignment model II), red: nucleosomes (DNABEND) competing 
with TBPs for the access to TATA boxes, black: nucleosome positions inferred from a microarray 
measurement. 22 
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Figure 4: Nucleosome-induced TF cooperativity explains nucleosome depletion over func- 
tional TF binding sites, a) Difference in the average nucleosome occupancy of functional and 
non-functional TF binding sites. 33 The difference is negative if the functional sites are depleted 
of nucleosomes. Green squares indicate statistically significant occupancy differences (P < 0.05). 
Upper panel: nucleosomes only, lower panel: nucleosomes competing with TFs (allowed to bind 
both functional and non-functional sites). Functional sites are conserved in at least 3 out of 4 sensu 
stricto yeast species and are strongly supported by ChlP-chip data (probes bound by the factor with 
P < 0.001), whereas the non-functional sites are not conserved and have weaker evidence for in 
vivo binding (P < 0.005). 33 b) Functional binding sites are clustered (SI Fig. 25c) and thus exhibit 
nucleosome-induced cooperativity 34,36 c) Nucleosome-induced cooperativity requires that a single 
nucleosome overlap two or more TF binding sites, d) Nucleosome-induced cooperativity in a model 
system with two TF sites covered by a single nucleosome. Nucleosome and TF occupancies at TF 
sites are plotted as a function of the binding energy (assumed to be the same for both factors). 
Dashed curves: occupancies in the absence of the other TF (no cooperativity), solid curves: occu- 
pancies in the presence of both TFs (cooperativity). e) Average nucleosome occupancy of functional 
and non-functional TF binding sites on chromosome III. Red: Yuan et al. measurements, yel- 
low: nucleosomes in the presence of TFs (DNABEND), green/blue: nucleosomes in the absence of 
TFs (DNABEND/alignment model I 6 ), dark blue/violet: stable nucleosomes in the absence of TFs 
(P > 0.5, DNABEND/alignment model I). The best discrimination is achieved by DNABEND when 
TFs are included. 
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Figure 5: Both TF and histone binding must be taken into account for accurate prediction 
of in vivo nucleosome positions. Fraction of correctly predicted nucleosomes as a function of the 
center-to-center distance between predicted nucleosomes and 110 nucleosomes with positions known 
from the literature. 6,38 In the absence of sequence-specific proteins, the fraction of correctly predicted 
nucleosomes is > 0.5 only for the PH05, HMRa, and HIS3 deletion loci with DN ABEND (SI Fig. 
26), and for the GAL1-10, STE6 loci with alignment model I, 6 for distances < 20 bp. See SI Fig. 
27 for details of implementation, including the definition of the null (random) model with steric 
exclusion. Results from the alignment model II are similar (data not shown). 



19 



HIS3-PET56-DED1 WT 




— , — i , . , i — > • > i — 1 1 . 1 1 — 1 1 1 — 1 1 1 1 — i , , , i — i > > i 

721600 721800 722000 722200 722400 722600 722800 723000 723200 723400 



>n 1 - 
O 

C 0,75 

O. 0.5 

| 0.25 



JlftJL 



i 1 1 1 ■ 1 1 1 ■ ■ 1 1 1 ■ 1 1 1 1 1 1 1 1 1 1 1 1 1 ■ ■ 1 1 1 ■ ■ 1 1 1 ■ |~~ 

721600 721800 722000 722200 722400 722600 722800 723000 723200 723400 



Chromosome 15 



PET56<- 



> HIS3 



-> DED1 



Figure 6: Detailed view of the HIS3-PET56-DED1 wildtype locus. Upper panel: 
DNABEND-predicted nucleosome occupancy, both with (maroon) and without (blue) other DNA- 
binding factors. Lower panel: TBP occupancy (green), GCN4 occupancy (dark blue), and ABF1 
occupancy (pink). Orange ovals: experimental nucleosome positions mapped by Sekinger et a/. 38 
Black arrows indicate translation start sites. We used a GCN4 weight matrix from Morozov et a/. 40 
and an ABF1 weight matrix from Maclsaac et a/. 33 to compute TF DNA-binding energies. The TBP 
weight matrix was derived using an alignment of TATA box sites from Basehoar et a/. 39 See SI Fig. 
26 for further details. Note that the region around bp 721800 is intrinsically devoid of nucleosomes, 
allowing TFs to bind in the nucleosome-free gap. 
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Supporting Information for: 
Extrinsic and Intrinsic Nucleosome Positioning 

Signals 

5 Supplementary Methods 
5.1 DNA geometry 

We model each DNA basepair as a rigid body and specify its position in space using a 
local coordinate frame attached to each basepair and defined using Ri = [n l x ,n l y ,n l z ] and 
fi{i = 0...N). Here 

^x,y,z are the orthonormal basis vectors of the local frame, and f; is 
its origin in the fixed global coordinate frame. Both the basis vectors and the vector to 
the origin can be expressed in terms of 6 independent parameters that are sufficient for 
reconstructing the spatial position of any rigid body. 1 Thus apart from a single global 
translation and rotation an arbitrary DNA conformation with N + l basepairs is uniquely 
specified if N sets of 6 geometric parameters are known. By convention, 2-5 the geometric 
parameters are chosen to be the three angular degrees of freedom which define unit vectors 
of the local frame attached to the basepair % in the local frame attached to the basepair 
i — 1, and the displacement vector which gives the origin of frame % with respect to origin of 
frame i — 1: a^ — dj), i — 1 . . . N . Here Qi = p i? £j} are the helical twist, roll and 
tilt angles, and di is the displacement vector with the x,y,z components called slide, shift 
and rise (Fig. la). 2-5 Since the geometric parameters which specify the spatial position 
of the basepair i are defined with respect to the frame rigidly attached to the previous 
basepair, their values capture local deviations in the DNA conformation. However, they 
can also be used to recursively construct the rotation matrix for the basepair i in the 
global frame: 

Ri = Ri-iTi (i — 1 . . . N), (3) 
where each Tj matrix is the product of three rotations: 

T t = + 0,)M,(r,)M z (-^ - 0,). (4) 

Here, 

(cos 9 sin# \ 
1 , 
— sin 9 cos 9 J 

(cos 9 sin 9 
-sin# cos# 
1 

rWp? + *?) 1/2 , 

cos^ = Pi/Ti, 
sin0j = U/Ti. 
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Note that ^ y {0) and M. z (9) are the rotation matrices around the y and z axes, respec- 
tively. The middle term on the right-hand side of Equation Q introduces both roll and 
tilt with a single rotation through angle Tj around a roll-tilt axis. The roll-tilt axis lies in 
the x-y plane of the mid-step coordinate frame (mid-step triad, or MST: Supplementary 
Fig. 7), and is in general inclined at an angle fa to the y-axis. 3 In this scheme, it is the 
mid-step triad that is used to transform the displacement vector di into the fixed global 
frame: Sf^ = Rf IST di. Introducing a single roll-tilt rotation axis is preferable to separate 
roll and tilt rotations because it eliminates the non-commutativity problem associated 
with the roll and tilt rotation operations. 3 ' 4 

Similarly to the basepair rotation matrices, the mid-step triads are also constructed 
recursively: 



Thus having a complete set of local geometric parameters di is equivalent to knowing 
the global DNA conformation: first, the recursive relation ^ is employed to determine 
the orientations of all basepair coordinate frames (except for the first one which has to 
be fixed in space independently and thus provides the overall orientation and position of 
the DNA molecule). Second, Equation ^ is used to construct the MST frames, which 
are then employed to transform all local displacements di into the global frame. Finally, 
all displacement vectors are added up vectorially to determine the origins of the basepair 
coordinate frames. If necessary, the basepair frames can then be used to reconstruct the 
positions of all DNA basepair atoms (using an idealized representation which neglects 
flexibility of bases in a basepair). Note that knowing a basepair coordinate frame is in 
general insufficient for predicting the positions of the phosphate backbone and sugar ring 
atoms because of their additional degrees of freedom. Finally we remark without showing 
the details of the calculations that the inverse problem is also well-defined: a full set of 
basepair and MST rotation matrices in the global frame is sufficient to reconstruct all 
local degrees of freedom We have implemented the solution to the inverse problem 

following a previously published description. 4 

While the helical twist serves to rotate the consecutive DNA basepairs with respect 
to one another, introducing non-zero roll and tilt angles imposes curvature onto the DNA 
conformation. Indeed, if both roll and tilt are negligible Tj is simply a rotation through 
— Q around the z-axis (which coincides with the helical axis in this case), and T^ ST is a 
rotation through —Q/2 around the same axis. The action of roll and tilt can be seen more 
clearly by considering the curvature vector, defined as the difference between the tangent 
vectors for two consecutive basepairs: Ki = n l + l — n l z . The expression for curvature is 
simplified in the limit of small roll and tilt: since the magnitude of roll (< 20°) and tilt 
(< 10°) are typically much smaller than the helical twist per basepair (~ 36°), Tj can be 
expanded to 0(p,t): 




(5) 



where 



(6) 
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cosf2 — sinf2 p cos(f2/2) + t sin(f2/2) 

Ti(n,p } t)=\ sinfi cosfi psin(fi/2) -tcos(fi/2) 

-pcos(fi/2) +tsin(fi/2) psin(fi/2) + i cos(fi/2) 1 

(7) 

Because we are primarily interested in the magnitude of the curvature vector, we can 
compute it in the local frame of basepair i where z = (0, 0, 1) is the z-axis 

unit vector. Using Equation <Q we obtain: 

(Pi cos(f2i/2) + ti sin(f2;/2) 
pi sin(n i /2) - U cos(f^/2) 

Thus in this limit the magnitude of the curvature vector contains equal contributions from 
both roll and tilt: = (pf + t 2 ) 1 ' 2 = T,. Furthermore, the roll and tilt contributions to 
curvature are shifted by 90° with respect to one another. 

Transformations between local and global descriptions of the DNA molecule are im- 
portant because the sequence-specific DNA elastic energy is best described in terms of 
the local degrees of freedom whereas it is the global coordinates that can be most 

naturally restrained to follow an arbitrary spatial curve. 

5.2 Ideal super helix 

In nucleosome core particles the spatial curve used to restrain DNA conformation is 
an ideal left-handed superhelix with pitch P and radius R, described by the following 
parametric equation: 

J Rcos(2vrt/P), 

r(i) = { Rsm(2nt/P), (8) 



Using the arc length s = \/ (2nR/P) 2 + 1 t to parameterize the curve we obtain: 

Rcos(s/R e ff), 

r(s) = { Rsm(s/R eff ), (9) 
-(P/2nR eff )s 



where R e ff — ^R 2 + (P/2n) 2 . A local frame at position s is given by a set of three 
orthonormal Frenet vectors (tangent, normal, and binormal): 

= dr/ds, 

= di/ds/\dt/ds\, (10) 
= t x n. 

We position iV+1 sets of Frenet basis vectors equidistantly along the superhelical curve: 
Si = 2iiR e ffa(i/N) (i — . . . N), where a is the number of nucleosomal superhelical turns. 
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Fitting an ideal superhelix to the high-resolution crystal structure of the nucleosome core 
particle 6 reveals that 

' a = 1.84, 

P = 25.9A, (11) 
R = 41.9 A 

In order to transform the slowly rotating Frenet basis vectors into the helically twisted 
local frames associated with DNA basepairs, we impose an additional helical rotation 
around the tangent vector t: 

n'( Sl ) = R z (Ql ot )n( Sl ), 
Here VL\ ot = iQq is the cumulative helical twist, = 34.696° is the average helical 



twist from the structure, and the rotation specified by Equations (12) is performed in 
the (t,n, b) basis. Similarly, the MST frames are located at s, = 2iiR e ffa(i — l/2)/N 
(i = 1 . . . N), and are rotated through (i — 1/2)Q to bring them into register with the 
helical twist. 

A superhelix described by Equation (j8l) has constant curvature which is manifested 
by the constant difference between the consecutive tangent vectors: |t(s i+ i) — t(sj)| = 
2 sin(7ra/iV). This allows us to determine the length of the roll-tilt vector: Tj = 4.53°, Vi 
(which also gives the maximum value of roll and tilt). In this idealized picture, roll and 
tilt can be assumed to make equal contributions to the curvature. Indeed, reconstruction 
of the local geometric parameters {a} from the full set of helically twisted Frenet frames 
results in f2j = (f2 , Tj cos(Q l tot + O ), Tj sin(f2£ ot + O )) and d { = (0,0, d), where d = 
3.333 A and <f) is the initial phase determined by the location of the first basepair. Thus 
twist and rise are constant for every basepair in the ideal superhelix, slide and shift are 
zero, whereas roll and tilt exhibit oscillations resulting from the superhelical curvature, 
and shifted by 90° with respect to one another (Supplementary Fig. 10). 



5.3 DNA conformational energy 

The total DNA conformation energy is a weighted sum of two terms: the sequence-specific 
DNA elastic energy designed to penalize deviations of the local geometric parameters from 
their average values observed in the protein-DNA structural database, and the restraint 
energy which penalizes deviations of the DNA molecule from the nucleosomal superhelix. 
Thus all interactions (primarily of electrostatic nature) between the DNA molecule and 
the histone octamer are modeled implicitly by imposing the superhelical restraint. 

DNA elastic energy. Using the local degrees of freedom «j, we represent the sequence- 
specific DNA elastic energy by a quadratic potential: 7 

1 N 

E d = 2 5> S " (a n(s) )} T F n(s) [a s - <c^>], (13) 

s=l 
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where the sum runs over all consecutive dinucleotides (basesteps) s and (a n ) are the 
average values computed for the basestep type n = AA, AC, AG, ..,TT using a collection 
of oligonucleotides extracted from a set of 101 non-homologous protein-DNA structures. 8 
The matrix of force constants F n is evaluated by inverting the covariance matrix C n of 
deviations of local geometric parameters from their average values (a n — (a n )): 7 

(F™)- 1 = C n , where C% = (« - - (a]))). (14) 

Note that our elastic energy model utilizes only and first and second moments of the 
empirical geometry distributions, and thus disregards all non-Gaussian effects such as 
skewness and bimodality. Including higher order moments amounts to introducing non- 
quadratic corrections to the elastic energy model, which makes the computation much less 
tractable. This problem is partially alleviated however by removing dinucleotides from the 
data set if one or more of their geometric parameters are further than 3 standard deviations 
away from the mean. The mean is then recomputed and the procedure is repeated until 
convergence. 7 DNA elastic energy is a function of 10 basestep type - dependent average 
values for each of the 6 local degrees of freedom (all averages and covariances for the 
dinucleotides related by reverse complementarity are equal by construction), and of 15 
independent force constants in each symmetric 6D matrix F n (n — 1 . . . 10). 

In order to be able to combine the DNA elastic term with the superhelical restraint 
term defined in the global frame, we apply the following coordinate transformation to 



Equation (13): 



Rs _ ( Rf ST ) (15) 

(1 and denote 3D unit and zero matrices, respectively). Applying R s to the local 
degrees of freedom leaves the angles (twist, roll, and tilt) invariant, but transforms the 
local displacements (shift, slide, and rise) into the global frame. To keep the elastic energy 
invariant, the force constants have to be transformed as well: F n = Rgi^Rj 1 . Equation 



(13) then becomes: 

1 N 

E el = -]T[R> S - (c^»] T F^[R s (a s - (a B W»], (16) 

s=l 

Finally, it is convenient to change variables so that all degrees of freedom are expressed 
in terms of their deviations from the ideal superhelix: 

d s — d s ~\~ &d s , 

n s = n° s + sn s . 1 ' 

Note that all the transformations described in this section involve no additional ap- 



proximations to the original DNA elastic energy (Equation (13)). Thus we are free to 



ideal superhelix in Equation (16) 



choose the most convenient rotation matrix in Equation (15), and use Rf IST \o from the 
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Superhelical restraint energy. Superhelical restraint energy is used to bend micro- 
somal DNA into the superhelical shape. We define the restraint energy as the quadratic 
potential: 

JV 

E. fc = 5>--*?) 2 > (18) 

3=1 

where r s and are the nucleosomal and the ideal superhelix radius-vectors in the global 
frame (s = 1 . . . N): 



la) 

=0 _ ~0 , v^s 7Cg),0 ( 19 ) 



r s = *0 + £J=l ^ 



Then to the lowest order the difference between the radius vectors is given by: 

(20) 



r _ r 0,P 
s s 



3=1 



a=l j'=l 



where a,/? = 1..3 label the vector components, and 



are the connectivity coefficients constructed as a product of the first derivative of the 
rotation matrix with respect to the frame angles and the ideal displacements c$ in 



the local frame. The first term in Equation (20) represents the net change in the global 
radius vector rf caused by the changes in the displacements dj up to and including the 
basestep s, while the second term reflects the change in the global radius vector resulting 
from modifying one of the rotation angles j < s. Note that changing a single rotation 
angle at position j affects every downstream basepair position linearly by introducing a 
kink into the DNA chain, whereas making a displacement change at that position simply 
shifts all downstream coordinates by a constant amount. The first derivative of the 
rotation matrix is evaluated as: 

{a T MST 
Tu...Ty ^T r . ,7^' l<f<j 



Upon substitution of the expansion (20 ) into the restraint energy we obtain an effective 
quadratic potential: 

E sh = <^ T G W, (21) 

«,i=i 

where 5a l = (5Qi,5di), and the 6x6 matrix of force constants is given by three distinct 
3x3 submatrices: 



G 13 



H F- 
F G 

1 y ^"13 
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Here, 



N 



= S a/ 3 ^] O s i@sj, 
s=l 

N 

F if = E ^^sjOijb^ 01 , 

s,l=l 

N 3 

H if = E QskQslQkiQlj ^ 
s,k,l=l 7=1 



where 



Oij - 



1, i>3 
0, i<j 



is the theta function, and 8 a p is the Kronecker delta. G^f couples displacements with 



displacements, couples displacements with angles (and thus has one connectivity 
coefficient), and couples angles with angles through two connectivity coefficients. 
Summing over some of the theta functions we obtain: 



Gf = 6 af3 B(i,j), 

N 

Ft/ = $>(M)C 

1=3 

N N 3 
k=i l=j 7=1 

where B(i,j) = J2f=i Qsflsj = N + 1- max(i, j). 

Total energy and DNA conformation minimization. The total energy of nucleoso- 
mal DNA is given by: 

E = E el + wE sh , (22) 

where w is the fitting weight introduced to capture the balance between favorable histone- 
DNA interactions and the unfavorable energy of bending DNA into the nucleosomal super- 
helix. We fit w to maximize the average correlation coefficient between the distributions of 
geometric parameters observed in the high-resolution crystal structure of the nucleosome 
core particle (lkx5), 6 and the corresponding DNABEND predictions (Supplementary Fig. 
10). This procedure yields w — 0.1 for the 147 bp superhelix and w = 0.5 for the 71 bp 
superhelix bound by the H32H42 tetramer (bps 39 through 109 in the 147 bp superhelix). 
DNABEND is not very sensitive to the exact value of w: we found a correlation of 0.99 
between the free energies computed using w — 0.1 and w = 0.5 and the 71 bp superhelix 
(data not shown). Note that the total energy is a sum of two quadratic potentials written 



27 



out in terms of the deviations of the global displacements Sd s and the local frame angles 
5Q S from their ideal superhelical values. Given the value of the fitting weight, the confor- 
mation adopted by the DNA molecule is the one that minimizes its total energy E. Since 
the energy is quadratic, finding the energy minimum is equivalent to solving a system of 
linear equations: 

dE/d8al = s = l...N, i = 1 . . . 6. (23) 



The numerical solution of the system of equations (23) provides a set of geometric pa- 
rameters corresponding to the minimum DNA energy: (5Q s ,5d s ). These can be used to 
find the original geometric parameters in the local frame: (0° + 5Q S , d^ + R^ ST 1 5d s ). 

Worm-like chain. According to the worm-like chain model, the energy required to bend 
a DNA molecule is sequence-independent and given by: 9 

where Lq is the contour length of the molecule, L p is the persistence length (estimated to 
be ~ 400 A), 9 fcgT ~ 0.6 kcal/mol, and t is the unit tangent vector. The contour length 



obtain \dt/ds\ 2 = R 2 / R\ ff) and thus 



of the ideal 147 bp superhelix is given by 146 x 3.333 A. From Equations (|9|) and (J 1 0|) we 



kgT LpL/Q R 
E wic = ^ — ^ 32.6 kcal/mol. 

K eff 



In Fig. 2b, the mean and the standard deviation a of DNABEND energies computed 
for chromosome III are 127.0 ± 6.1. Equating the worm-like chain model estimate with 
the mean DNABEND energy, we obtain a scaling coefficient of 0.26 which can be used 
to express DNABEND energies in kcal/mol. This gives the difference of 15.2 kcal/mol 
between the best and the worst chromosome III sequences, and a = 1.6 kcal/mol. Most 
sequences differ by 2a or less in Fig. 2b and are thus separated by < 6.4 kcal/mol. 
A similar value of the scaling coefficient (0.21) arises from a linear model fit between 
experimental and DNABEND-predicted free energies in Supplementary Fig. 12a (red 
circles) . 



5.4 Predicting genome-wide occupancies of DNA-binding pro- 
teins 

Chromatin structure plays an important role in regulating eukaryotic gene expression. 10-13 
Eukaryotic DNA is packaged by histones into chains of nucleosomes that are subsequently 
folded into ~ 30 nm chromatin fibers. Because arrays of nucleosomes form on chromoso- 
mal DNA under physiological conditions, in order to predict genomic nucleosomal occu- 
pancies we need to describe DNA packaged into multiple non-overlapping nucleosomes. 
Furthermore, nucleosomes may compete with other DNA binding proteins for genomic 
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sequence. For example, several closely spaced TF binding sites may serve to displace 
the nucleosomes from the promoter region upon TF binding. To take the competition 
between nucleosomes and other factors into account we need to consider configurations 
with multiple regularly spaced nucleosomes (accomodating a 20-25 bp linker between the 
end of one nucleosome and the beginning of the next) as well as other DNA-binding pro- 
teins. The probability of any such configuration can be computed if we can construct a 
statistical sum (partition function) over all possible configurations. Though the sum has 
exponentially many terms and is thus impossible to evaluate by brute force for all but 
very short DNA sequences, it can nonetheless be efficiently computed with a dynamic 
programming algorithm. 14 

Here we develop the dynamic programming approach for a general case of M objects of 
length Lj (j = 1 . . . M) that are placed on genomic DNA (i.e. occupy Lj DNA basepairs). 
The objects could represent nucleosomes, TFs, or any other DNA-binding proteins. The 
binding energy of object j with DNA at each allowed position i is assumed to be known: 
Ef (i — 1 . . . N — Lj + 1), where N is the number of basepairs. For nucleosomes we will 
carry out a genome-wide computation of DNA elastic energies as described above, while for 
other factors the energy landscapes will be constructed based on their binding preferences 
inferred from footprinting experiments, SELEX assays, bioinformatics predictions, etc. 
We assign index to the background which can be formally considered to be an object of 
length 1: Lq = 1. In the simplest case which we consider here the background energy is 
zero everywhere, but more sophisticated models could incorporate a global bias by making 
positions near DNA ends less favorable, etc. 

We wish to compute a statistical sum over all possible configurations in which object 
overlap is not allowed (including the background "object"): 

Z = J2e~ E{conf \ (25) 

conf 

where E(conf) = 2~^j=o ^2i=i ^cU) * s ^ ne total energy of an arbitrary configuration of non- 
overlapping objects, N 3 ob - is the number of objects of type j, and E 3 ,^ is the pre-computed 
energy of the object of type j which occupies positions c(i) through c(i) + Lj — 1. 

It is possible to evaluate Z (or the free energy F = log(Z)) efficiently by recursively 
computing the partial statistical sums: 

M 

Z( = £ zU^^-Wi-to-i), i = l-..N, (26) 

3=0 

with the initial condition Zq — 1. The theta function is defined as: 

{'::!•" < 27 > 

It is computationally more efficient to transform Equation (26) into log space by defining 
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the partial free energies Fi = \ogZ{: 

( M 

Fi = log [J2 e Fl - LrE '- L > +1 t -L J+ i ) , i = 1 • • • N, (2* 



vi=o 



with the initial condition Fq = 0. For numerical stability, we rewrite Equation (28) as an 
explicit update of the partial free energy computed at the previous step: 

Fi = + log \J2 e Fi -^" Fi - 1 " £? -^ +1 i -x J+ i j , i = 1 • • • N. (29) 

The free energy computed in the final step is the full free energy where all possible 
configurations are taken into account: Fn = F = log(Z). Since the algorithm proceeds 
by computing partial sums from 1 to N it is often called the forward pass. Similar 
equations can be constructed for the backward pass which proceeds from N to 1: 

M 

Z[ = ZZ +L .e- E h N ^ Lj+2 , i = N ... 1, (30) 

3=0 



with the initial condition Z T N+1 = 1. We transform Equation (30) into log space by 
defining the backward partial free energies Ri = \ogZ\: 

R t = hg(^e R ^- E '9 N ^ Lj+2 ^, i = N...l, (31) 

with the initial condition i2jv+i = 0, or equivalently: 

Ri = R l+l + log \J2 e R * +L J- Rl+1 - E *9 N _i_ Lj+2 ^j , i = N...l. (32) 

Note that R\ = F N = F by construction. 

With the full set of forward and backward partial free energies we can evaluate any 
statistical quantity of interest. For example, the probability of finding an object of type 
j at positions (i . . . i + Lj — 1) is given by: 

^ = zUe~^Zl +L] = eFi _^ E{+Ri+Lj _ F ^ i = 1 N _ L . + 1 (33) 

Another quantity of interest is the occupancy of the basepair i by object j, defined as 
the probability that basepair i is covered by any object of type j: 15 

i 

01= * = 1 --- N - ( 34 ) 

k=i-(Lj- 1) 
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(note that P° k = for k > N — Lj + 1). If the object is composite (i.e. consists of Lj 
basepairs extended symmetrically on both sides by L| m to take the nonzero linker lengths 
into account: Lj = Lj + 2L| m ), we may be interested in the partial occupancy by Lj 



basepairs at the center of the object. Equation (34) then becomes: 



o\= £ n= E H- (35) 

fc=i-(L j -l)-LJ m k=i-{Lj-L1 m ) 

Recursively, 

Q\ = 0\_l + P/_ L em — -P/„(£ j _^em- ) „ 1 - (36) 

Here we assume that all quantities on the right-hand side are set to zero if their indices 
are outside their definition domains (such as O J ). 

Finally, we need to take into account the fact that the objects can bind DNA in both 
directions, and thus there are two binding energies for each position: E\ (object j starts 
at % and extends in the 5' to 3' direction) and Ef- rc ' (object j starts at i + Lj — 1 and 
extends in the 3' to 5' direction). It is easy to show that the formalism developed above 
applies without change if the binding energies El are replaced by the free energies El 
which take both binding orientations into account: 

Ei = -\og\e- E '+e- E ' (rC) }. (37) 

In the case of a single type of DNA-binding object (such as the nucleosome energies 
predicted with DNABEND) there are two free parameters: the mean energy of nucle- 
osomes (E nuc ) over a chromosome or a given genomic region which affects overall nu- 
cleosome occupancy, and the standard deviation a(E nuc ) which plays the role of inverse 
temperature. Note that DNABEND nucleosomes are 157 bp long because a 147 bp hi- 
stone binding site is flanked on both sides by 5 bp long linkers which do not contribute 
to the energy. Because the background energy is set to zero, making (E nuc ) more neg- 
ative results in fewer bases being devoid of nucleosomes (this corresponds to increasing 
free concentration of histone octamers in the histone-DNA solution), and vice versa. The 
inverse temperature is related to the nucleosome stability: making <y(E nuc ) smaller results 
in fewer stable nucleosomes (e.g. those present in half or more of all the configurations in 



Equation (25)), while allowing for a bigger difference between favorable and unfavorable 
nucleosomal energies results in more configurations with 'frozen', stable nucleosomes. For 
all calculations in this paper we have chosen to set (E nuc ) = 0.0, a 2 (E nuc ) = 45.0 for every 
chromosome. The nucleosome energies with zero mean result in the average nucleosome 
occupancy of 0.797 over all chromosomes, and 12538 stable, non-overlapping nucleosomes 
(with P > 0.5) covering 16.3% of the yeast genome. In addition, this value of a(E nuc ) is 
close to the scale of DNABEND elastic energies. The free parameters described here are 



also present in the alignment model (albeit in a more implicit manner, see section 5.8) 
and are required in general for each type of DNA-binding object. 
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5.5 TF and TBP binding energies 



All TF binding energies were computed using position-specific weight matrices (PWMs) 
from Morozov et al. 16 and Maclsaac et al.: 17 



E TF {S) = -C^og^ (38) 



i=i 



where L TF is the length of the binding site, oij = {A, C, G, T} is the nucleotide at position 
i, is the frequency of base aij at position i in the PWM, and b ai is the background 
frequency of base af. {b A ,b c \b G ,b T } = {0.309,0.192,0.191,0.308} (the same background 



frequencies as used in the alignment-based nucleosome model, see section 5.8). C is an 
energy scale, set to 1 for all TFs. The distribution of energies is bounded from below by 
the energy of the consensus sequence E min which is a free parameter of the model since 
it depends on the TF concentration in solution. Thus the energy of a TF bound to site 
S is given by: E F f w (S) = E offset + (E TF (S) - E min ), where E offset is typically set to -6.0 
(relative to the 0.0 average energy of DNABEND nucleosomes) , but can be also varied 
through a range of energies (cf. Fig. 4d). Finally, the two energies of binding the same 
site in both directions are combined using: 

E TF (S) = - log [B w e~ E ^ + (1 - B w )e' E ^ s ^] , (39) 

where B w = 0.5 is the strand bias. A set of TF binding energies computed in this way 
(together with the nucleosome binding energy profile) is used as input to the recursive 



algorithm from Section 5.4, In practice, we restrict a set of sites to which TFs are allowed 
to bind by assigning a highly unfavorable energy to all TF positions that are not listed 
as predicted sites by Maclsaac et al. 17 

TBP binding energies were obtained in the same way, except that the TBP PWM was 
derived using an alignment of TATA box sites from Basehoar et al. 18 TBPs were only 
allowed to bind the sites listed in Basehoar et al. as TATA boxes. 



5.6 Nucleosome positions from tiled microarray analysis 

Yuan et al. have developed a hidden Markov model-based approach for inferring nucleo- 
some positions from high resolution tiled microarray data. 19 The data was collected for 
most of S.cerevisiae chromosome III, plus additional regulatory regions. We have used 
Yuan et al. "HMM calls" listed in the Supplementary Data to identify probes occupied 
by either a well-positioned or a fuzzy nucleosome (manually added "hand calls" were 
not considered). We then extended nucleosomal regions from the mid-point of the first 
occupied probe to the mid-point of the last occupied probe, plus additional 10 bp on 
each side (making it consistent with the definition of nucleosome coverage of TF bind- 
ing sites employed in Fig. 3 of Yuan et al.). 19 Thus the nucleosomal length is given 
by 20(iV — 1) + 20, where N is the number of contiguous probes occupied by a single 
nucleosome. We find that most nucleosomes occupy 6 contiguous probes and thus cover 
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120 bp, somewhat less than 147 bp expected from the structural point of view. The to- 
tal number of nucleosomes placed on chromosome III is 1045; every basepair covered by 
either well-positioned or fuzzy nucleosome was assigned an occupancy of 1.0, resulting in 
the overall average occupancy of 0.488 (that is, almost half of the genome is covered by 
the non-overlapping nucleosomes). The resulting occupancy profile and the corresponding 
nucleosome positions were used to compile all Yuan et al. results in this work. 

5.7 H2A.Z nucleosome positions from DNA sequencing 

Albert et al. have used high throughput DNA sequencing to identify genomic positions 
of H2A.Z nucleosomes in S.cerevisiae with a median 4 bp resolution. 20 We converted 
their nucleosome positioning data into the occupancy profile by using the "fine-grain" 
nucleosome center coordinates (reported for both strands in Albert et al. Supplementary 
Materials) to place 147 bp nucleosomes on the genome. Since many "fine-grain" nucle- 
osomes overlap, we end up with longer nucleosome-occupied regions containing multiple 
hits and surrounded by regions with zero nucleosome coverage. To each base pair in these 
occupied regions we then assign an integer equal to the number of nucleosomes that cover 
this particular base pair, thus producing an unnormalized nucleosome occupancy profile. 
Finally, the occupancy profile is normalized by rescaling the integer counts separately in 
each occupied region so that the highest occupancy is assigned a value of 1.0. The total 
number of overlapping H2A.Z nucleosome reads is 34796; with the conventions described 
above the average occupancy is 0.118 over all chromosomes. The resulting occupancy 
profile and the corresponding nucleosome positions were used to produce Albert et al. 
results for Supplementary Figs. 22 and 27b. 

5.8 Alignment-based model of nucleosome energies and occu- 
pancies 

Segal et al. have recently developed a nucleosome positioning model based on the align- 
ment of 199 sequences occupied by nucleosomes in vivo in the yeast genome. 15 The main 
assumption of the model is that it is possible to infer nucleosome formation energies using 
dinucleotide distributions observed in aligned nucleosomal sequences. To this end, N seq 
experimentally determined mononucleosome sequences of length ~ 147 bp and their re- 
verse complements were aligned around their centers as described in Segal et al. 15 The 
rules for choosing the center of the alignment should be clear from the following example: 

x 

AAGCGTTAAACGC seql 

GCGTTTAACGCTT seql.rc 

ATGCGACG seq2 

CGTCGCAT seq2_rc 

Here 'x' labels the center of the alignment, which is chosen in an obvious way for sequences 
with an odd number of nucleotides (seql), and by making the number of nucleotides to 
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the left of the central nucleotide one less than the number of nucleotides to the right of 
the central nucleotide for sequences with an even number of nucleotides (seq2). 

A local average over neighboring dinucleotide positions was carried out by shifting 
the original alignment of 2N seq by 1 bp to the right as well as 1 bp to the left, and 
adding the shifted sequences to the original alignment. Thus the final alignment has 
6N seq sequences. Next, the conditional probability of a nucleotide of type a at position i 
(S°) given a nucleotide of type (3 at position i — 1 (S^) is computed at every position 
within 67 bp of the center of the alignment: 



]\r(QaqP \ 

P(S?\S? J = , i , i = -67... 67 (40) 



where a, /?, 7 = {A, C, G, T}, N(SfSf_ l ) is the number of times the (3a dinucleotide occurs 
in the augmented alignment, and marks the alignment center. Keeping conditional 
probabilities only for 135 central positions disregards dinucleotide counts from the outer 
edges of the alignment to which fewer sequences contribute. The final model is 157 bp 
long because 135 conditional probabilities are flanked on each side by 11 background 
nucleotide frequencies, to approximate steric replusion between nucleosomes: 

P(Sy iSf-i) = P 9 {S a ), 1 = (-78.. - 68, 68..78) (41) 

where {P 9 (A), P 9 (C), P 9 (G), P 9 (T)} = {0.309012,0.191674,0.191303,0.308012} are the 
background frequencies from the yeast genome. Finally, the model assigns a negative log 
score (interpreted as the energy of nucleosome formation) to a 157 bp long nucleotide 
sequence S: 



78 p( qoti I 



-78 



where a* denotes the nucleotide type at position % in the sequence S, and P m (S a ) are either 
the genomic background frequencies defined above or uniform background frequencies: 
{P U (A),P U (C),P U (G),P U (T)} = {0.25,0.25,0.25,0.25} (i.e. m = {g,u}). Note that by 



construction the flanking nucleotides do not contribute to Equation (42) (apart from a 
possible difference in background frequencies), and thus play the role of embedding linker 
regions. Finally, to reproduce results from Segal et al. 15 the energies of both strands are 
combined using an empirical formula which implicitly sets the temperature, resulting in 
13978 stable, non-overlapping nucleosomes (with probability P > 0.5) that cover 18.2% 
of the yeast genome, and the average occupancy of 0.844 over all chromosomes: 

L I (S k ) = L 9 (S k ) + L u (St c) ), (43) 

where S k is a 157 bp sequence starting at position k in the genome, and is its reverse 
complement. Note that genomic background frequencies are used for one strand while 
uniform background frequencies are used for the other. We refer to the log scores computed 
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in this way, and the corresponding nucleosome probabilities and occupancies as alignment 
model I. Alternatively, in alignment model II we use genomic background frequencies 
for both DNA strands and rescale the temperature explicitly to match DNABEND and 
alignment model I: 



L"{S k ) = lA[L M (S k ) - (L° ld (S k ))] + (L M (S k )), (44) 



where L old (S k ) = L9{S k ) + L°(S { k rc) ) and (...) signifies an average over the chromosome. 



Note that while the log scores are rescaled in Equation (44), their mean stays the same 



(and close to zero because the background scores have been subtracted, cf. Equation 



(42)). We treat the log scores from Equations (43) and (44) as nucleosome energies, and 
use them to compute nucleosome probabilities and occupancies as described in section |5\4 
We find that alignment model II predictions do not strongly depend on the exact value of 
the scaling coefficient. Alignment model II results in the average nucleosome occupancy 
of 0.810 over all chromosomes; 26670 stable, non-overlapping nucleosomes (with P > 0.5) 
are placed, covering 34.7% of the yeast genome. The overall correlation between alignment 
models I and II is 0.66 for log scores and 0.60 for occupancies. 



5.9 Histone-DNA binding affinity measurements 

We used a standard competitive nucleosome reconstitution procedure to measure the rel- 
ative affinity of different DNA sequences for binding to histones in nucleosomes. 21 In this 
method, differing tracer DNA molecules compete with an excess of unlabeled competitor 
DNA for binding to a limiting pool of histone octamer. The competition is established 
in elevated [NaCl], such that histone-DNA interaction affinities are suppressed and the 
system equilibrates freely. The [NaCl] is then slowly reduced by dialysis, allowing nu- 
cleosomes to form; further reduction in [NaCl] to physiological concentrations or below 
then "freezes- in" the resulting equilibrium, allowing subsequent analysis, by native gel 
electrophoresis, of the partitioning of each tracer between free DNA and nucleosomes. 
The distribution of a given tracer between free DNA and nucleosomes defines an equilib- 
rium constant and a corresponding free energy, valid for that competitive environment. 
Comparison of the results for a given pair of tracer DNAs (in the identical competitive 
environment) eliminates the dependence on the details of the competitive environment, 
yielding the free energy difference (AAG) of histone-interaction between the two tracer 
DNAs. To allow for comparison with other work, we include additional tracer DNAs as ref- 
erence molecules: a derivative of the 5S rDNA natural nucleosome positioning sequence 21 
and the 147 bp nucleosome-wrapped region of the selected high affinity non-natural DNA 
sequence 601. 22 

The 5S and 601 reference sequences were prepared by PCR using plasmid clones as 
template. 146 and 147 bp long DNAs analyzed in X-ray crystallographic studies of nucleo- 
somes (PDB entries laoi and lkx5, respectively) were prepared as described 23 using clones 
supplied by Professors K. Luger and T.J. Richmond, respectively. New 147 bp long DNA 
sequences designed in the present study were prepared in a two-step PCR-based proce- 
dure using chemically synthesized oligonucleotide primers. All synthetic oligonucleotides 
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were gel purified prior to use. The central 71 bp were prepared by annealing the two 
strands. The resulting duplex was gel purified and used as template in a second stage 
PCR reaction to extend the length on each end creating the final desired 147 bp long 
DNA. The resulting DNA was again purified by gel electrophoresis. 

DNA sequences to be analyzed were 5' end-labeled with 32 P, and added in tracer 
quantities to competitive nucleosome reconstitution reactions. Reconstitution reactions 
were carried out as described 21 except that each reaction included 10 /xg purified histone 
octamer and 30 /ig unlabeled competitor DNA (from chicken erythrocyte nucleosome core 
particles) in the 50 /A microdialysis button. 

5.10 Hydroxy! radical footprinting of nucleosomal templates 

DNA templates. Plasmids pGEM-3Z/601, pGEM-3Z/603 and pGEM-3Z/605 contain- 
ing nucleosome positioning sequences 601, 603 and 605, respectively, were described pre- 
viously. 24 To obtain templates for hydroxyl radical footprinting experiments the desired 
~ 200-bp DNA fragments were PCR-amplified using various pairs of primers and Taq 
DNA polymerase (New England BioLabs). The sequences of the primers will be provided 
on request. To selectively label either upper or lower DNA strands one of the primers in 
each PCR reaction was 5'-end radioactively labeled with polynucleotide kinase and 7 32 P- 
ATP. 25 The single-end-labeled DNA templates were gel-purified and single nucleosomes 
were assembled on the templates by dialysis from 2 M NaCl. 25 Nucleosome positioning 
was unique on at least 95% of the templates. 

Hydroxyl radical footprinting. Hydroxyl radicals introduce non-sequence-specific 
single-nucleotide gaps in DNA, unless DNA is protected by DNA-bound proteins. 26 Hy- 
droxyl radical footprinting was conducted using single-end-labeled histone-free DNA or 
nucleosomal templates as previously described. 26 In short, 20-100 ng of single-end-labeled 
DNA or nucleosomal templates were incubated in 10 mM HEPES buffer (pH 8.0) in the 
presence of hydroxyl radical-generating reagents present at the following final concentra- 
tions (2 mM Fe(II)-EDTA, 0.6% H202, 20 mM Na-ascorbate) for 2 min at 20° C. Reaction 
was stopped by adding thiourea to 10 mM final concentration. DNA was extracted with 
Phe:Chl (1:1), precipitated with ethanol, dissolved in a loading buffer and analyzed by 
8% denaturing PAGE. 

Data analysis and sequence alignment. The denaturing gels were dried on What- 
man 3MM paper, exposed to a Cyclone screen, scanned using a Cyclone and quantified 
using OptiQuant software (Perklin Elmer). Positions of nucleotides that are sensitive to or 
protected from hydroxyl radicals were identified by comparison with the sequence-specific 
DNA markers (Supplementary Fig. 13). The dyad was localized by comparison of the 
obtained footprints with the footprints of the nucleosome assembled on human a-satellite 
DNA. The latter footprints were modeled based on the available 2.8 A resolution X-ray 
nucleosome structure. 27 
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6 Supplementary Tables 

Table 1. Predicted (F pred ) and measured (F exp ) free energies of computationally de- 
signed sequences. Experimental free energies are shown relative to the reference sequence 
from the L.variegatus 5S rRNA gene. Histone binding is dominated by the contribution 
from the H32H42 tetramer with the 71 bp binding site. 22 The best binder is created by 
using simulated annealing to introduce mutations and thus minimize the energy of a 71 
bp DNA molecule. B71S1 and B71S2 have different sequences flanking the 71 bp designed 
site (whose contribution dominates the total free energy). 601S1 and 601S2 consist of the 
71 bp site from the center of the 601 sequence 24 and flanking sequences from B71S1 and 
B71S2, respectively. W147S is a 147 bp sequence whose free energy (with contributions 
from multiple H32H4 2 binding sites) was maximized by simulated annealing. X146 and 
X147 are 146 bp and 147 bp DNA sequences from nucleosome crystal structures laoi 27 
and lkx5. 6 





Fpred (arb. units) 


F exp (kcal/mol) 


B71S1 


-20.09 


-1.57 ± 0.41 


B71S2 


-20.10 


-1.51 ± 0.27 


601S1 


-0.49 


-2.99 ± 0.55 


601S2 


-1.74 


-2.46 ± 0.18 


W147S 


15.26 


0.09 ± 0.23 


X147 


5.92 


0.75 ± 0.29 


X146 


6.86 


0.45 ± 0.91 
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Table 2. Average values of DNA geometric parameters for each dinucleotide type. Din- 
ucleotides are counted in both 5' and 3' directions of strand I (which is chosen arbitrar- 
ily). Note that by construction only tilt and shift switch signs when the dinucleotide 
direction is changed. 4 ' 5 N - total number of dinucleotides of a given type found in the 
non-redundant database of protein-DNA structures (see Supplementary Methods). An- 
gles (twist, roll, tilt) are measured in degrees (°), displacements (shift, slide, rise) are 
measured in Angstroms (A). 



Dinucleotide 


N 


Twist 


Roll 


Tilt 


Shift 


Slide 


Rise 


AA/TT 


163 


35.31 


0.76 


T-1.84 


T-0.05 


-0.21 


3.27 


AC/GT 


154 


31.52 


0.91 


T-0.64 


±0.21 


-0.54 


3.39 


AG/CT 


128 


33.05 


3.15 


T-1.48 


±0.12 


-0.27 


3.38 


CA/TG 


146 


35.02 


5.95 


T-0.05 


^0. 16 


0.18 


3.38 


CC/GG 


133 


33.17 


3.86 


±0.40 


±0.02 


-0.47 


3.28 


GA/TC 


138 


34.80 


3.87 


T-1.52 


^0.27 


-0.03 


3.35 


CG 


124 


35.30 


4.29 


0.00 


0.00 


0.57 


3.49 


GC 


102 


34.38 


0.67 


0.00 


0.00 


-0.07 


3.38 


AT 


174 


31.21 


-1.39 


0.00 


0.00 


-0.56 


3.39 


TA 


156 


36.20 


5.25 


0.00 


0.00 


0.03 


3.34 



Table 3. Standard deviations of DNA geometric parameters for each dinucleotide type. 
Dinucleotides are counted in both 5' and 3' directions of strand I (which is chosen arbi- 
trarily). N - total number of dinucleotides of a given type found in the non-redundant 
database of protein-DNA structures (see Supplementary Methods). 



Dinucleotide 


N 


Twist 


Roll 


Tilt 


Shift 


Slide 


Rise 


AA/TT 


163/163 


3.61 


4.55 


1.69 


0.32 


0.39 


0.19 


AC/GT 


154/154 


3.10 


3.91 


1.85 


0.48 


0.32 


0.20 


AG/CT 


128/128 


4.06 


4.07 


2.27 


0.51 


0.51 


0.24 


CA/TG 


146/146 


4.17 


4.67 


2.15 


0.56 


0.77 


0.26 


CC/GG 


133/133 


4.76 


4.37 


2.86 


0.67 


0.73 


0.27 


GA/TC 


138/138 


4.03 


3.78 


2.06 


0.48 


0.61 


0.20 


CG 


124 


4.49 


5.54 


2.42 


0.69 


0.56 


0.28 


GC 


102 


4.94 


4.37 


2.42 


0.65 


0.60 


0.20 


AT 


174 


3.04 


3.64 


1.25 


0.39 


0.27 


0.18 


TA 


156 


7.07 


6.17 


2.35 


0.48 


0.91 


0.21 
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Table 4. Force constant matrices for each dinucleotide type. Dinucleotides are counted in 
both 5' and 3' directions of strand I (which is chosen arbitrarily). By construction, tilt and 
shift switch signs when the dinucleotide direction is changed, 4 ' 5 inducing a corresponding 
sign change in the force constants involving these degrees of freedom. 





Twist 


Roll 


Tilt 


Shift 


Slide 


Rise 


AA/TT 


Twist 


0.130 


0.041 


±0.046 


±0.159 


-0.305 


0.739 


Roll 


0.041 


0.069 


±0.026 


±0.053 


-0.213 


-0.135 


Tilt 


±0.046 


±0.026 


0.406 


-0.408 


T0.073 


T-0.403 


Shift 


±0.159 


±0.053 


-0.408 


11.948 


±2.162 


T-2.511 


Slide 


-0.305 


-0.213 


T-0.073 


±2.162 


8.241 


-5.353 


Rise 


0.739 


-0.135 


T-0.403 


T-2.511 


-5.353 


35.715 


AC/GT 


Twist 


0.179 


0.053 


±0.015 


T-0.083 


-0.118 


1.244 


Roll 


0.053 


0.085 


±0.003 


±0.048 


0.049 


0.089 


Tilt 


±0.015 


±0.003 


0.323 


-0.312 


T0.298 


T-0.219 


Shift 


T0.083 


±0.048 


-0.312 


4.912 


±1.672 


T-0.872 


Slide 


-0.118 


0.049 


T-0.298 


±1.672 


10.089 


-3.277 


Rise 


1.244 


0.089 


T-0.219 


T0.872 


-3.277 


35.709 


AG/CT 


Twist 


0.113 


0.038 


T-0.001 


±0.168 


0.040 


0.850 


Roll 


0.038 


0.077 


±0.022 


±0.006 


-0.023 


0.067 


Tilt 


T-0.001 


±0.022 


0.280 


-0.365 


±0.252 


T-0.995 


Shift 


±0.168 


±0.006 


-0.365 


4.954 


±0.527 


±0.093 


Slide 


0.040 


-0.023 


±0.252 


±0.527 


4.516 


-2.966 


Rise 


0.850 


0.067 


T-0.995 


±0.093 


-2.966 


29.330 


CA/TG 


Twist 


0.098 


0.027 


T-0.034 


±0.061 


-0.254 


0.799 


Roll 


0.027 


0.059 


T-0.046 


±0.165 


-0.016 


0.202 


Tilt 


T0.034 


T0.046 


0.393 


-0.965 


±0.174 


T0.593 


Shift 


±0.061 


±0.165 


-0.965 


5.740 


T-0.117 


±1.963 


Slide 


-0.254 


-0.016 


±0.174 


T-0.117 


2.772 


-4.449 


Rise 


0.799 


0.202 


T-0.593 


±1.963 


-4.449 


23.870 
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Table 4 (continued). 





Twist 


Roll 


Tilt 


Shift 


Slide 


Rise 


CC/GG 


Twist 


0.114 


0.045 


±0.008 


^0.139 


-0.328 


1.587 


Roll 


0.045 


0.075 


^0.014 


±0.005 


-0.083 


0.797 


Tilt 


±0.008 


^0.014 


0.218 


-0.537 


^0.215 


±0.066 


Shift 


T-0.139 


±0.005 


-0.537 


3.917 


±0.085 


T-0.401 


Slide 


-0.328 


-0.083 


T-0.215 


±0.085 


3.795 


-7.972 


Rise 


1.587 


0.797 


±0.066 


T-0.401 


-7.972 


41.678 


GA/TC 


Twist 


0.133 


0.055 


±0.027 


±0.073 


-0.350 


1.301 


Roll 


0.055 


0.097 


±0.042 


T-0.099 


-0.158 


0.336 


Tilt 


±0.027 


±0.042 


0.408 


-1.012 


±0.016 


±0.139 


Shift 


±0.073 


^0.099 


-1.012 


10.434 


±3.814 


T-1-105 


Slide 


-0.350 


-0.158 


±0.016 


±3.814 


6.278 


-7.676 


Rise 


1.301 


0.336 


±0.139 


T-1.105 


-7.676 


41.988 


CG 


Twist 


0.101 


0.020 


0.000 


0.000 


-0.278 


0.806 


Roll 


0.020 


0.040 


0.000 


0.000 


0.001 


-0.046 


Tilt 


0.000 


0.000 


0.255 


-0.510 


0.000 


0.000 


Shift 


0.000 


0.000 


-0.510 


3.104 


0.000 


0.000 


Slide 


-0.278 


0.001 


0.000 


0.000 


3.991 


-2.936 


Rise 


0.806 


-0.046 


0.000 


0.000 


-2.936 


20.174 


GC 


Twist 


0.069 


0.006 


0.000 


0.000 


-0.217 


0.646 


Roll 


0.006 


0.057 


0.000 


0.000 


0.094 


0.168 


Tilt 


0.000 


0.000 


0.256 


-0.542 


0.000 


0.000 


Shift 


0.000 


0.000 


-0.542 


3.473 


0.000 


0.000 


Slide 


-0.217 


0.094 


0.000 


0.000 


4.030 


1.322 


Rise 


0.646 


0.168 


0.000 


0.000 


1.322 


34.392 


AT 


Twist 


0.189 


0.068 


0.000 


0.000 


0.111 


1.195 


Roll 


0.068 


0.124 


0.000 


0.000 


0.438 


-0.397 


Tilt 


0.000 


0.000 


0.641 


-0.043 


0.000 


0.000 


Shift 


0.000 


0.000 


-0.043 


6.670 


0.000 


0.000 


Slide 


0.111 


0.438 


0.000 


0.000 


15.942 


-2.611 


Rise 


1.195 


-0.397 


0.000 


0.000 


-2.611 


47.789 
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Table 4 (continued). 





Twist 


Roll 


Tilt 


Shift 


Slide 


Rise 


TA 


Twist 


0.112 


0.056 


0.000 


0.000 


-0.572 


1.466 


Roll 


0.056 


0.064 


0.000 


0.000 


-0.146 


0.318 


Tilt 


0.000 


0.000 


0.365 


-1.271 


0.000 


0.000 


Shift 


0.000 


0.000 


-1.271 


8.767 


0.000 


0.000 


Slide 


-0.572 


-0.146 


0.000 


0.000 


4.961 


-12.092 


Rise 


1.466 


0.318 


0.000 


0.000 


-12.092 


54.957 
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Twist -Q 




Supplementary Figure 7: Schematic illustration of a single dinucleotide (base step) 
geometry. Coordinate frames attached to basepairs i and i + 1 are shown in blue, and the MST 
coordinate frame is shown in green. For illustrative purposes, only rise D z and twist f2 are set to 
non-zero values. The origin of the MST frame is at the midpoint of the line connecting the origins 
of two base pair frames (which are separated by D z A along the z-axis); the MST frame is rotated 
through 0/2 with respect to the frame i. 
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Supplementary Figure 8: Elastic energy analysis based on the DNA conformation from 
the nucleosome crystal structure, a) Position-dependent sequence specificity in the nucleosomal 
DNA revealed by the energetic analysis of dinucleotides substituted into the crystal structure of the 
nucleosome core particle (PDB code: lkx5). 6 All possible dinucleotides were introduced at every 
position in the 147 bp nucleosomal site using DNA dihedral angles from the native dinucleotide, 
and DNA elastic energy was computed for every sequence variant. Upper panel: the difference 
between the energy of the most favorable dinucleotide and the average energy of all dinucleotides 
at this position. Lower panel: information score, defined as I(s) = log 2 (16) + Yll=i Pi ^°S.2(pt) • 
where pf = exp(— E?)/ Yll=i ex P( — Ef), and Ef is the elastic energy change which results from 
introducing dinucleotide of type i = 1 ..16 at position s: Ef = E^ mut ^ _ ^( wt ) j enforce the 
two-fold symmetry of the nucleosome core particle, all dinucleotide energies were symmetrized around 
the middle of the DNA site (shown as a dashed vertical line). Middle panel: roll angle of the ideal 
superhelix showing DNA geometry in relation to the histone octamer - by construction, negative 
roll angles correspond to the minor groove facing the histones. b) Elastic energy components for 
all possible dinucleotides substituted into the lkx5 crystal structure at position 109 where the DNA 
conformation is kinked (Supplementary Fig. 10). 6 Dinucleotides are ranked by their total energy 
as shown in the legend (lowest to highest energy from top to bottom). TA is the lowest energy 
dinucleotide (thick golden line) and is more favorable than the CA dinucleotide from the native 
structure. The energy component analysis reveals that it is the degrees of freedom related to slide 
(slide-slide and slide-twist components) and roll (roll-roll component) that make the TA dinucleotide 
most favorable, although the slide-slide component is slightly better in the native CA/TG dinucleotide 
(red/brown dots). In contrast, the AT dinucleotide has the highest energy due to its low flexibility 
with respect to roll, slide, and twist. The dinucleotide ranking is in agreement with the averages, 
standard deviations, and force constants of the roll, slide and twist degrees of freedom observed in 
the structural database of protein-DNA complexes (Supplementary Tables 2-4). 
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a 



Ideal Superhelix 



b 



1kx5 




Supplementary Figure 9: Schematic representation of the nucleosomal DNA con- 
formation, (a) - ideal superhelix, (b) - high-resolution crystal structure of the nucleosome core 
particle (PDB code: lkx5). 5 Each DNA base pair is shown as a rectangular block. Images of DNA 
conformations were created using 3DNA software. 2 
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Base step Base step 

Supplementary Figure 10: DNABEND-predicted and experimentally observed DNA 
geometries are correlated. Distributions of 6 DNA geometric degrees of freedom in the crystal 
structure of the nucleosome core particle 5 (PDB code: lkx5; Supplementary Fig. 9b) (blue), in the 
minimum energy structure obtained starting from the ideal superhelix and using lkx5 DNA sequence 
as input to DNABEND (red), and in the ideal superhelix with no energy relaxation (Supplementary 
Fig. 9a) (green). The two-fold nucleosome symmetry axis is shown using dashed vertical lines. Mean 
values of geometric degrees of freedom in the ideal superhelix are shown as dashed horizontal lines. 
Correlation coefficients between the geometric distributions from the native and minimized structures 
are: {r twisU r roU , r m , r sUde , r shift , r rise ) = (0.489,0.709,0.539,0.536,0.247,0.238) «r) = 0.460). 
Black curve in the slide panel is a DNABEND prediction based on a modified elastic potential 
in which the mean slide for the CA dinucleotide was increased from 0.18 (inferred from a set 
of non-nucleosome protein-DNA complexes and used in this work) to 0.91 (inferred from a lim- 
ited set of available nucleosome structures). Note that the CA dinucleotides occur at positions 
(3,10,16,26,50,53,60,66,77,85,102,109,115,133) in the lkx5 crystal structure. 6 Whereas the slide 
correlation coefficient between the native and minimized structures remained essentially the same 
at 0.533, the absolute magnitude of predicted slide peaks is in better correspondence with the ob- 
served values. Thus an elastic potential trained on non-nucleosome complexes may not reproduce 
all structural aspects of highly bent nucleosomal DNA. 
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Supplementary Figure 11: DNABEND reproduces experimentally observed couplings 
between DNA degrees of freedom. DNA geometric degrees of freedom are strongly coupled 
in nucleosomal DNA compared with other protein-DNA complexes. Shown as heat maps are the 
matrices of absolute values of correlation coefficients for DNA geometric degrees of freedom (white: 
correlation of 1.0, red: little or no correlation), a) Correlations between DNA geometric parameters 
observed in the crystal structure of the nucleosome core particle (PDB code: lkx5). 6 b) Correlations 
found in the structural database of non-homologous protein-DNA complexes 8 used to build the elastic 
energy potential, c) Correlations found in the DNA conformation predicted by DNABEND starting 
from the ideal superhelix and using lkx5 DNA sequence as input. Note that many off-diagonal 
couplings observed in (a) are reproduced. 
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DNABEND: r = Q .77/0.83 
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Supplementary Figure 12: DNABEND accurately predicts free energies of nucleosome 
formation and ranks nucleosome sequences, a) Predictions of in vitro free energies of nucle- 
osome formation for a set of natural and synthetic sequences measured using nucleosome dialysis 
(red circles) 28 and nucleosome exchange (green circles). 29,30 Upper panel: DNABEND free energy 
predictions (with experimental error bars when available). Lower panel: free energy predictions for 
the same dataset using a bioinformatics model based on the alignment of sequences extracted from 
yeast mononucleosomes (the alignment model, see Supplementary Methods). 15 Artificial high affin- 
ity sequence 601 22,28 is highlighted in black. The free energies were computed using only the central 
71 bp of the nucleosomal site: F = - log e ~ Ei + e~ E ^ C> ), where L is the length of 

each sequence, L w = 71 bp is the length of DNA bound by the H32H42 tetramer (bp 39 through 
109 in the 147 bp superhelix), and E%, E^f c ^ are the DNA elastic energies of bending the 71 bp 
long site into the superhelical shape at sequence positions i . . . i + 70 for the forward strand {Ei), 
and i + 70...i for the reverse strand (E^). In the alignment model, log-scores with genomic 
background frequencies were used instead of energies (L 9 , cf. Equation (42)). b) Histograms of 
DNA elastic energies/alignment model log scores for the mouse genome sequences selected for their 
ability to position nucleosomes (red), 31 or to impair nucleosome formation (blue). 32 Upper panel: 
DNABEND (with w = 0.5), lower panel: alignment model with genomic background frequencies. 15 
In sequences longer than 71 bp the most favorable energy/log score was used, taking both forward 
and reverse strands into account. We assume that in all nucleosome reconstitution experiments 
described above the selective pressure was exerted mainly on the central stretch of the nucleosomal 
DNA which interacts with the H3 2 H4 2 tetramer. 22 
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Supplementary Figure 13: Hydroxyl radical footprinting of the 601, 603 and 605 
nucleosomal templates. Single-end-labeled histone-free DNA (D) or nucleosomes (N) were exposed 
to hydroxyl radicals, DNA was purified and analyzed by denaturing PAGE (representative gels are 
shown). The lower and upper DNA strands are shown in Supplementary Fig. 14. Nucleosome 
positions are indicated by ovals; the positions of nucleosomal dyads in the gels are indicated by 
diamonds. The A, T and G markers were obtained by primer extension using Taq DNA polymerase and 
terminating dideoxynucleotide triphosphates (ddATP (A), ddTTP (T), and ddGTP (G), respectively). 
M: end-labeled pBR322-Mspl digest (DNA size markers). 
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Supplementary Figure 14: Schematic description of the results of hydroxyl radical 
footprinting of the 601, 603 and 605 nucleosomal templates. The sites accessible to and 
protected from hydroxyl radicals are shown in bold and regular letters, respectively. The accessible 
and protected nucleotides were identified by quantitative analysis of hydroxyl radical footprinting 
patterns (Supplementary Fig. 13). The positions of nucleosomal dyads are indicated by diamonds. 
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Supplementary Figure 15: DNABEND predictions of in vitro nucleosome positions. 

Probability of a nucleosome to start at each base pair (green), nucleosome occupancy (blue), and 
nucleosome formation energy (violet). Vertical lines: experimentally known nucleosome starting 
positions, with bp coordinates listed in brackets below, (a) The 180 bp sequence from the sea urchin 
5S rRNA gene [bps 8, 26]. 33 (b) The 183 bp sequence from the pGUB plasmid [bps 11, 31]. 34 (c) The 
215 bp fragment from the sequence of the chicken (3— globin A gene [bp 52]. 35 (d,e,f) Synthetic high- 
affinity sequences 24 601 [bp 61], 603 [bp 81], and 605 [bp 59]. Experimentally known nucleosome 
starting positions are listed in a one-based coordinate system with consecutively numbered base 
pairs. Nucleosomes on sequences 601, 603, and 605 were mapped by hydroxyl radical footprinting 
(Supplementary Figs. 13 and 14). All DNA sequences used in this calculation are available on the 
Nucleosome Explorer website: http://nucleosome.rockefeller.edu. 
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Supplementary Figure 16: DNABEND-selected sequences exhibit periodic dinucleotide 
patterns. Relative fractions of AA/TT, TA, AT, and GC dinucleotides in the alignment of 1000 
147 bp sequences from the S.cerevisiae chromosome IV. Green: sequences with lowest DNA elastic 
energies, red: sequences with highest DNA elastic energies, dark blue: randomly picked sequences. 
DNA elastic energies were predicted by DNABEND. 
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Supplementary Figure 17: Autocorrelation functions show periodicity in nucleosome 
occupancies, energies, and log scores. Autocorrelation function for nucleosome occupancies com- 
puted using DNABEND (a) and alignment model I (b). 15 Autocorrelation function for nucleosome 
energies computed using DNABEND (c), and for log scores computed using alignment model I, 
Equation (43) (d). 15 Nucleosome occupancies are periodic due to steric exclusion, while energies 
and log scores are periodic due to DNA helical twist. 
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Supplementary Figure 18: Cross-correlation functions reveal low correlation between 
DNABEND and alignment model predictions. Cross-correlation functions between energies/log 
scores (a) and occupancies (b) computed using DNABEND and alignment model I. Both correlations 
are negative when computed at zero offset. 
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Supplementary Figure 19: Average nucleosome occupancies and standard errors for 
S.cerevisiae genomic regions, (a) - DNABEND, (b) - alignment model I, 15 (c) - alignment model 
II. 
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Supplementary Figure 20: Stable nucleosomes are arranged in a regular array induced 
by the periodic occupancy profile. Center-to-center distance counts for proximal stable nucleo- 
somes (P > 0.5) predicted using DNABEND (a) and alignment model I (b). Red: null model in 
which the same number of stable nucleosomes is randomly positioned on the genome without over- 
lap. Solid red lines are mean values for 100 random placements, dashed red lines are one standard 
deviation away. Alignment model II exhibits similar oscillations (data not shown). 
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Supplementary Figure 21: Average nucleosome occupancy plotted with respect to 
translation start sites. Green - DNABEND nucleosomes only (same as in Fig. 3a), cyan - 
DNABEND nucleosomes competing with TFs allowed to bind to functional sites, 17 red - DNABEND 
nucleosomes competing with TBPs (same as Fig. 3a), maroon - DNABEND nucleosomes competing 
with both TFs and TBPs. Dark blue - a non-specific nucleosome model (all nucleosome positions 
assigned a constant energy) competing with TBPs. 
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Supplementary Figure 22: Average H2A.Z nucleosome occupancy plotted with respect 
to translation start sites. The occupancy profile was derived as described in the Supplementary 
Methods, starting from H2A.Z nucleosome positions determined with high throughput DNA se- 
quencing by Albert et a/. 20 Note that nucleosome occupancy profiles were constructed by giving 
an approximately equal weight to each gene, regardless of the total number of sequence reads in 
its vicinity. This procedure results in occupancy profiles that are smoother than those based on 
weighting all sequence reads equally, as shown in Albert et al. 20 
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Supplementary Figure 23: Average nucleosome energies and log scores plotted with 
respect to translation start sites. Dark blue: DNABEND, maroon: alignment model I, dark 
red: alignment model II. Nucleosome stability in the upstream regions is higher than average with 
DNABEND and alignment model I, and lower than average with alignment model II. 
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Supplementary Figure 24: Difference in the average nucleosome occupancy of func- 
tional and non-functional TF binding sites. 17 The difference is negative if the functional sites are 
less covered by nucleosomes. Green squares indicate statistically significant occupancy differences 
(P < 0.05). (a) Nucleosomes only, alignment model I, 15 (b) nucleosomes only, alignment model 
II, (c) nucleosomes competing with TFs. In (c), non-functional sites are the same as in Fig. 4a, 
whereas the positions of functional sites have been randomized within intergenic regions (we retained 
TF binding energy distributions of functional sites after randomization, disregarding actual TF bind- 
ing specificities for the purposes of this test). Error bars correspond to 20 different realizations of 
functional site randomization. Note that with DNABEND functional binding sites are covered by 
nucleosomes if their positions are randomized, whereas according to alignment models functional 
binding sites are intrinsically uncovered. As in Fig. 4a, we only consider TFs for which at least 20 
binding sites are available in each (functional or non-functional) set. 
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Supplementary Figure 25: Spatial distribution and binding energies of functional and 
non-functional TF binding sites, (a) Histogram of TF binding energies in functional (blue) and 
non-functional (red) sites. 17 Zero energy corresponds to the most favorable (consensus) sequence, 
(b) Histogram of TF binding site positions with respect to the translation start site. Blue - functional 
sites, red - non-functional sites, green - randomized functional sites, (c) Histogram of the center- 
to-center distances between proximal TF sites shows TF site clustering. Dark red - functional sites, 
dark blue - randomized functional sites. Error bars are too small to be shown in the latter graph. 
Randomized functional sites are as in Supplementary Fig. 24. 
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Supplementary Figure 26: Predictions of nucleosome positions known from the liter- 
ature: specific genomic loci. Orange ovals: experimentally mapped nucleosomes. Upper panel: 
DNABEND-predicted nucleosome occupancy, both with (maroon) and without (blue) other DNA- 
binding factors. Lower panel: TBP occupancy (green) and TF occupancy (dark blue, dark orange, 
pink). Black arrows indicate translation start sites. To compute DNA-binding energies of TFs we 
scanned intergenic regions using PWMs from Morozov et al. 16 or, if those were not available, from 
Maclsaac et al. 17 (a few PWMs had a different origin as discussed below in individual Figure cap- 
tions). TBP PWM was derived using an alignment of TATA box sites from Basehoar et al.; 18 since, 
unlike TFs, there were too many false positives we further restricted regions where TBP could bind 
using information from the literature, and omitted TBP binding altogether if no such regions could 
be found. The TF binding energy to the consensus (lowest energy) sequence was typically set to -9.0 
kcal/mol, whereas for TBP it was always -6.0 kcal/mol (cf. section 5.5 of Supplementary Methods). 
Finally, we corrected obvious discrepancies between measured and predicted nucleosome occupancies 
by changing the value of the chemical potential (the concentration of the unbound histone octamer) 
in several cases, notably for HMLa, HMR& and recombination enhancer loci, and by omitting the 
5bp linkers from the nucleosome model. These changes affected only maroon curves. All predictions 
shown here are summarized in Figure 5 from the main text. 
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Supplementary Figure 26a: Detailed view of the GAL1-10 locus, with experimental nucle- 
osome positions mapped by Li et a/. 36 Green: TBP, dark blue: GAL4. Several GAL4 factors bind 
closely spaced sites and act cooperatively to displace a single nucleosome. Once the nucleosome is 
removed TBP is free to bind TATA sequences in the nucleosome-depleted region. 



65 



CHA1 




16200 16400 16600 16600 17000 17200 17400 17600 17800 18000 

Chromosome 3 



< — I 




i > 


CHA1 




VAC17 



Supplementary Figure 26b: Detailed view of the CHA1 locus, with experimental nucleosome 
positions mapped by Moreira et al. 37 Green: TBP, dark blue: CHA4, dark orange: ABF1. 
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Supplementary Figure 26c: Detailed view of the PH05 locus, with experimental nucleosome 
positions mapped by Horz et a/. 38 " 40 Green: TBP, dark blue: PH02, dark orange: PH04. 
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BAR1 




Supplementary Figure 26d: Detailed view of the BAR1 locus, with experimental nucleosome 
positions mapped by Shimizu et al. A1 Green: TBP, dark blue: MATa2 — MCM1 — MATa2 (with 
PWM constructed using a combination of a TRANSFAC MCM1 PWM and a structure-based lmnm 
PWM 16 ). Note that the experimentally mapped nucleosomes are phased off the region occupied by 
MATa2 - MCM1 - MATa2; however, in the DNABEND simulation nucleosomes move to the left 
instead, perhaps because we overestimate the stability of the nucleosome starting at bp 322200. 
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Supplementary Figure 26e: Detailed view of the STE6 locus, with experimental nucleosome 
positions mapped by Shimizu et al. 41 Dark blue: MATa2 - MCM1 - MATa2 (same PWM as in 
Supplementary Fig. 26d). Note that in contrast to the BAR1 locus (Supplementary Fig. 26d), 
nucleosomes are shifted in the right direction when MATa2 — MCM1 — MATa2 binds, phasing off 
the TF-occupied region. 
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Supplementary Figure 26f: Detailed view of the HMLa locus, with experimental nucleo- 
some positions mapped by Weiss et al. 42 Dark blue: origin recognition complex (PWM based on 
the consensus sequence from Ref. 43 ), dark orange: ABF1, pink: RAP1. Note that two prominent 
gaps in the otherwise regularly spaced array of experimentally mapped nucleosomes coincide with 
the regions of high TF occupancy. 
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Supplementary Figure 26g: Detailed view of the HMRa locus, with experimental nucle- 
osome positions mapped by Ravindra et al. AA Dark blue: origin recognition complex (PWM based 
on the consensus sequence from Ref. 43 ), dark orange: ABF1, pink: RAP1. Similar to the HMLa 
locus (Supplementary Fig. 26f), two prominent gaps in the otherwise regularly spaced array of 
experimentally mapped nucleosomes coincide with the regions of high TF occupancy. 
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Supplementary Figure 26h: Detailed view of the recombination enhancer locus, with exper- 
imental nucleosome positions mapped by Weiss et al. 45 Dark blue: MATa2 — MCM1 — MATa2 
(same PWM as in Supplementary Fig. 26d). Similar to the HMLct and HMR& loci (Supplementary 
Figs. 26f,g), two prominent gaps in the otherwise regularly spaced array of experimentally mapped 
nucleosomes coincide with the regions of high TF occupancy. 
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ADH2 




Supplementary Figure 26i: Detailed view of the ADH2-UBP15 locus, with experimental 
nucleosome positions mapped by Verdone et al. 45 Green: TBP, dark blue: ADR1. 
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Supplementary Figure 26j: Detailed view of the HIS3-PET56-DED1 wildtype locus. Upper 
panel: DNABEND-predicted nucleosome occupancy, both with (maroon) and without (blue) other 
DNA-binding factors. Lower panel: TBP occupancy (green), GCN4 occupancy (dark blue), and 
ABF1 occupancy (pink). Orange ovals: experimental nucleosome positions mapped by Sekinger et 
al. 47 Black arrows indicate translation start sites. We used a GCN4 weight matrix from Morozov 
et al. and an ABF1 weight matrix from Maclsaac et a/. 17 to compute TF DNA-binding energies. 
The TBP weight matrix was derived using an alignment of TATA box sites from Basehoar et al. 18 
Note that the region around bp 721800 is intrinsically devoid of nucleosomes, 47 allowing TFs to bind 
in the nucleosome-free gap. 
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Supplementary Figure 26k: Detailed view of the HIS3-PET56 wildtype locus (zoom-in of 
(j)), with experimental nucleosome positions mapped by Sekinger et al. AJ Green: TBP, dark blue: 
GCN4, pink: ABF1. Note that the region around bp 721800 is intrinsically devoid of nucleosomes, 47 
allowing TFs to bind in the nucleosome-free gap. 
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Supplementary Figure 261: Detailed view of the HIS3-PET56 XY210 sequence deletion 
locus, with experimental nucleosome positions mapped by Sekinger et al. 47 Green: TBP, dark blue: 
GCN4. Note that DNABEND likely overpredicts the stability of the nucleosome centered at bp 
721800. As a result, this nucleosome is not displaced in competition with TFs (whose binding 
energies may also be underestimated). 
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Supplementary Figure 26m: Detailed view of the HIS3-PET56 XY204 sequence deletion 
locus, with experimental nucleosome positions mapped by Sekinger et al. 47 Green: TBP, dark blue: 
GCN4. Note that unlike the case of the HIS3-PET56 wildtype locus (Supplementary Fig. 26k), TF 
occupancy improves correlation with experiment by moving nucleosomes to the left of the TF-covered 
region. 
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Supplementary Figure 26n: Detailed view of the HIS3-PET56 XY205 sequence deletion 
locus, with experimental nucleosome positions mapped by Sekinger et al. 47 Dark blue: GCN4. Note 
that the intrinsic gap around bp 721800 is enlarged through the action of GCN4, though its width is 
still insufficient compared to experiment. 
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Supplementary Figure 27: Predictions of nucleosome positions mapped in genome- 
wide assays. Fraction of correctly predicted nucleosomes as a function of the minimal center- 
to-center distance between predicted nucleosomes and (a) nucleosomes mapped using Yuan et al. 
microarray assay, 19 (b) H2A.Z nucleosomes mapped using Albert et al. DNA sequencing approach. 20 
In the null (random) models, non-overlapping nucleosomes are placed on the genome with probability 
P = 1.0. Distances between neighboring nucleosomes are sampled from a Gaussian distribution 
with the mean chosen to reproduce the average occupancy predicted by the actual model, and the 
standard deviation set to 0.5 of the mean (negative linker lengths are not allowed). Thus in the null 
models nucleosomes are positioned non-specifically but on average form a regular array. The only 
parameter borrowed from the actual model is the predicted average occupancy. The center-to-center 
nucleosome distance is determined by locating in the periodic occupancy profile the nearest region 
with occupancy of at least 0.35 over the nucleosomal length. Error bars that correspond to 100 
random realizations of the null models are too small to be shown. Note that in both (a) and (b) 
red and cyan curves nearly overlap; adding TF and TBP binding to the alignment models does not 
improve their performance (data not shown). 
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